RFC for new AUR Metadata Archives format + diffs
Schema
So here is a detailed of description of a new metadata format that I believe to be the most efficient to use for both the client and AUR.
The archive is a sorted list of one line per package in the format:
<pkname><space><pkgjson>
Example:
bubblemon {"ID":208446,"Name":"bubblemon","PackageBaseID":86,"PackageBase":"bubblemon","Version":"2.0.15-2","Description":"Bubbling Load Monitoring Applet for the GNOME Panel","URL":"http://www.nongnu.org/bubblemon/","NumVotes":5,"Popularity":0.0,"OutOfDate":1535516100,"Maintainer":"encelo","Fgno3dtet irstSubmitted":1113183215,"LastModified":1436303557,"URLPath":"/cgit/aur.git/snapshot/bubblemon.tar.gz"},
gno3dtet {"ID":208460,"Name":"gno3dtet","PackageBaseID":110,"PackageBase":"gno3dtet","Version":"1.96.1-2","Description":"Slick top-down 3D Tetris game","URL":"http://eseb.net/3dtetris.php","NumVotes":2,"Popularity":0.0,"OutOfDate":null,"Maintainer":"encelo","FirstSubmitted":1113188204,"LastModified":1436303711,"URLPath":"/cgit/aur.git/snapshot/gno3dtet.tar.gz"},
rdiff-backup-devel {"ID":204012,"Name":"rdiff-backup-devel","PackageBaseID":173,"PackageBase":"rdiff-backup-devel","Version":"1.3.3-2","Description":"A utility for local/remote mirroring and incremental backups","URL":"http://www.nongnu.org/rdiff-backup/","NumVotes":9,"Popularity":0.0,"OutOfDate":null,"Maintainer":"Dragonlord","FirstSubmitted":1113230185,"LastModified":1435508624,"URLPath":"/cgit/aur.git/snapshot/rdiff-backup-devel.tar.gz"},
I've used space as a delim here but it could be anything.
This allows a program to read the data in and search for pkgnames it wants without parsing any json or allocated memory until it finds the packages it wants.
If the program needs the json data it may for each line, parse the json into an existing buffer, meaning no memory is used (except when needed to grow the buffer with realloc).
Diffs
Diffs take the same format except a line may instead be -pkgname to indicate a package was removed:
Example:
-aur
bubblemon {"ID":208446,"Name":"bubblemon","PackageBaseID":86,"PackageBase":"bubblemon","Version":"2.0.15-2","Description":"Bubbling Load Monitoring Applet for the GNOME Panel","URL":"http://www.nongnu.org/bubblemon/","NumVotes":5,"Popularity":0.0,"OutOfDate":1535516100,"Maintainer":"encelo","Fgno3dtet irstSubmitted":1113183215,"LastModified":1436303557,"URLPath":"/cgit/aur.git/snapshot/bubblemon.tar.gz"},
gno3dtet {"ID":208460,"Name":"gno3dtet","PackageBaseID":110,"PackageBase":"gno3dtet","Version":"1.96.1-2","Description":"Slick top-down 3D Tetris game","URL":"http://eseb.net/3dtetris.php","NumVotes":2,"Popularity":0.0,"OutOfDate":null,"Maintainer":"encelo","FirstSubmitted":1113188204,"LastModified":1436303711,"URLPath":"/cgit/aur.git/snapshot/gno3dtet.tar.gz"},
-pacman-git
rdiff-backup-devel {"ID":204012,"Name":"rdiff-backup-devel","PackageBaseID":173,"PackageBase":"rdiff-backup-devel","Version":"1.3.3-2","Description":"A utility for local/remote mirroring and incremental backups","URL":"http://www.nongnu.org/rdiff-backup/","NumVotes":9,"Popularity":0.0,"OutOfDate":null,"Maintainer":"Dragonlord","FirstSubmitted":1113230185,"LastModified":1435508624,"URLPath":"/cgit/aur.git/snapshot/rdiff-backup-devel.tar.gz"},
-spotify
-yay
The AUR will store diffs aged, 5 mins, 1 hour, 1 day, 1 week 1 month 1 year. To be generated every 5 minutes.
When the client requests a diff we send the diff that is >= the age they specified.
User says they are 3 days out of date -> we send the week old diff.
User says they are 2 years our of date -> we send the whole package list.
It does not matter that the diff we send is older than what they want as they can ignore deletes they don't have and process a changed package that hasn't actually changed as usual.
Merging
As both the current state and diff are sorted we can merge the lists by doing the following.
function get_pkgname(line) {
if line[0] == '-' {
line = line[1:]
}
return line.split_once(" ")[0]
}
var list
var diff
var new_list
for line in list:
pkgname = get_pkgname(line)
while true {
curr_diff = diff.next()
diff_pkgname = get_pkgname(diff)
if diff_pkgname < pkgname {
# not there yet so keep searching
continue
} else {
# either we have an exact match or we didnt find the value in the list
# write the new value either way
# eat value if starts with -
if diff[0] != '-' {
# write new value
new_list.write(diff)
}
break
}
}
This is O(n), only loads one line from both lists at a time, does 0 json parsing.
This also assumes you would write the new_list to a new file then atomically mv new_list onto old_list.