Skip to content
Snippets Groups Projects
Kristian Klausen's avatar
Kristian Klausen authored
Example:
ALBS commit: 7afef906 ("Add myself as maintainer")
Tasks download started: 2021-04-05T00:46:44+02:00
Task download finished: 2021-04-05T00:46:51+02:00
Attachments download started: 2021-04-05T00:46:51+02:00
Attachments download finished: 2021-04-05T00:46:51+02:00
16dcaaee
History

Arch Linux Bugs Snapshotter (ALBS)

With the sunsetting of bugs.archlinux.org on the horizon (+/- ~years), it is time to think about how we are going to archive it, so it can be accessed by future generations.

This is a take on it! :)

Snapshots

The snapshots branch contains a snapshot of https://bugs.archlinux.org, which is updated regularly.

Usage

First install wget, rsync, libxslt and prettier, then run:

$ ./snapshotter.sh [maximum number of tasks to download] [download attachment: true (default) or false] [prettify the HTML files: true (default) or false] [download dir, default: snapshots/2021-04-01T22:52+02:00]

How It Works

  1. https://bugs.archlinux.org/index.php?project=0&status[]=&changedfrom=2021-04-01 is scrapped to get the newest task id
  2. The range of tasks to download is decided:
    • If $ALBS_RANGE_DOWNLOAD_ENABLED = true then:
      • A range of tasks is computed based on $ALBS_RANGE_DOWNLOAD_CHUNK and $ALBS_RANGE_DOWNLOAD_CHUNKS
    • else:
      • $min=0
      • $max=$new_task_id
  3. A list of URLs is generated: https://bugs.archlinux.org/task/{$min..$max}
  4. wget starts downloading the URLs, including page requisites and linked user pages
  5. xsltproc is run on all the HTML files to cleanup the html (remove navbar entries, login form etc.)
  6. prettier is run on all the HTML files to prettify the HTML (primarily fixing indentation)

Cloning

If you don't need the snapshots branch, you can do a partial clone and avoid downloading all the blobs needed by that branch:

$ git clone --filter=blob:none https://gitlab.archlinux.org/archlinux/archlinux-bugs-snapshotter.git

The snapshots branch also contains all the attachments, which you can also avoid downloading, by using sparse-checkout:

$ git sparse-checkout set '/*' '!/attachments/'
$ git sparse-checkout init
$ git checkout snapshots

Maintainer

ALBS is written and maintained by Kristian Klausen.