- Dec 04, 2023
-
-
Kristian Klausen authored
Without index.php it ends up downloading the new landing page, which describes that GitLab is the future!
-
- Dec 02, 2023
-
-
Kristian Klausen authored
get_newest_task_id returns 79982 which is not the newest task id, 80347 is.
-
- Nov 25, 2023
-
-
Kristian Klausen authored
2+ years ago running the script against bugs.archlinux.org became very slow. I'm not sure what caused it, but let's just try reducing the number of tasks per run and see if it can be done without hitting the job timeout (2 hours).
-
- Apr 07, 2021
-
-
Kristian Klausen authored
Introduced in: c654db96 ("Fix error out when the attachment is named tmp")
-
Kristian Klausen authored
This can happen if the filename is invalid, ex: https://bugs.archlinux.org/task/6698 contains a attachment with the filename "." which wget can't write to: .: Is a directory Cannot write to '.' (Is a directory).
-
Kristian Klausen authored
It was broken by: 406c3b5f ("Wait between retrievals to keep load down") At the same time, fix an issue where the first url isn't downloaded[1]. [1] https://stackoverflow.com/questions/41043163/xargs-sh-c-skipping-the-first-argument
-
Kristian Klausen authored
-
- Apr 06, 2021
-
-
Kristian Klausen authored
-
- Apr 05, 2021
-
-
Kristian Klausen authored
If an attachment with id 1234 and 12345 exists for the same task, the 12345 attachment could be selected for both.
-
Kristian Klausen authored
-
Kristian Klausen authored
sort --unique combined with --numeric-sort only check if the first number is unique ($task_id), which means that only the first attachment is downloaded. Fixed by switching to uniq for filtering out duplicates.
-
Kristian Klausen authored
Rewritten the href only when the attachment hasn't been downloaded isn't enough, we need to do it every time.
-
Kristian Klausen authored
Ex: https://bugs.archlinux.org/task/29604
-
Kristian Klausen authored
-
Kristian Klausen authored
-
Kristian Klausen authored
https://bugs.archlinux.org/task/6698 contains a attachment with the filename "." which wget can't write to: .: Is a directory Cannot write to '.' (Is a directory).
-
Kristian Klausen authored
Ex: .config from https://bugs.archlinux.org/task/3786
-
Kristian Klausen authored
Example: ALBS commit: 7afef906 ("Add myself as maintainer") Tasks download started: 2021-04-05T00:46:44+02:00 Task download finished: 2021-04-05T00:46:51+02:00 Attachments download started: 2021-04-05T00:46:51+02:00 Attachments download finished: 2021-04-05T00:46:51+02:00
-
- Apr 04, 2021
-
-
Kristian Klausen authored
-
Kristian Klausen authored
-
Kristian Klausen authored
The old approach didn't work for the reasons listed in: 5ff17f7c ("Revert "Disable download of attachments by adding a --reject pattern"") Fix #1
-
Kristian Klausen authored
Fixes: 893ce8c1 ("Ignore task hrefs with query parameters for now")
-
Kristian Klausen authored
Let's try with 0.05s first.
-
Kristian Klausen authored
Ex: https://bugs.archlinux.org/task/69718 which has a link to https://bugs.archlinux.org/task/69462?string=tt-rss, ideally we should rewrite it to "69462#string=tt-rss".
-
https://bugs.archlinux.org/task/Kristian Klausen authored
This seems to happen when a comment links to a comment in another task[1]. Before: 20596#comment65592.html After: 20596.html#comment65592 [1] https://bugs.archlinux.org/task/21194
-
Kristian Klausen authored
It does more harm than good: * Some comment links aren't converted[1] * Old http://bugs.archlinux.org/task/ links isn't downloaded due to --max-redirect 0 * Links with query parameters are downloaded multiple times, ex: https://bugs.archlinux.org/task/69462?string=tt-rss from [2] This reverts commit 94b035d4. [1] https://bugs.archlinux.org/task/13059 [2] https://bugs.archlinux.org/task/69718
-
- Apr 03, 2021
-
-
Kristian Klausen authored
This means we can add "task" to --include-directories and remove the href="https://bugs.archlinux.org/task/<id>" to "<id>.html" transform logic as wget is handling it.
-
Kristian Klausen authored
So Flyspray (apparently) enjos putting a newline character in the middle of hrefs which gets translated to "%0A", so let's remove it.
-
https://bugs.archlinux.org/task/<idKristian Klausen authored
wget only does this if "--include-directories" contains "task" ("download attachment" = true) or if the task was already downloaded (or is in the url list (not sure)). Ref #2
-
- Apr 02, 2021
-
-
Kristian Klausen authored
The project was moved into the Arch Linux group.
-
Kristian Klausen authored
As we are now only processing ~7k tasks per run, I'm not too worried by its slowness.
-
Kristian Klausen authored
Before it would only match if "download attachment" was true and it would also match other <a> with "https://bugs.archlinux.org".
-
Kristian Klausen authored
-
Kristian Klausen authored
-
Kristian Klausen authored
ALBS_PARALLELIZE_DOWNLOAD=true to enable It doesn't seem to improve performance just load on bugs.al.org.
-
Kristian Klausen authored
-
Kristian Klausen authored
-
Kristian Klausen authored
-