Don't parse .db files ourselves; use pyalpm instead
Created by: LukeShu
In a patchset that I recently submitted, Eli was concerned that I was parsing .db files with bsdtar+awk, when the format of .db files isn't "public"; the only guarantees made about it are that libalpm can parse it.
https://lists.archlinux.org/pipermail/arch-projects/2018-June/004932.html
I wasn't too concerned, because ftpdir-cleanup
and sourceballs
already
parse the .db files in the same way. Nonetheless, I think Eli is right: we
shouldn't be parsing these files ourselves.
So, add a dbquery
function that uses pyalpm to parse the .db files:
-
It takes as arguments Python 3 expressions;
- one that that returns a bool deciding whether we want to print information on a package, and
- another that returns the string to print for a package.
Currently, all callers use "True" for the decider expression, as ftpdir-cleanup and sourceballs operate on every package. However, I'm including a way to filter packages because, I'm coming at this from the context that I want to parse .db files in other places too.
-
libalpm doesn't offer an easy way to say "parse this DB file for me"; instead, we must construct a configuration that has a syncdb pointing to that file, which we then have it sync in to a temporary directory.
As a final note, when re-writing the bit of sourceballs to use dbquery instead of AWK, I realized that it does not correctly handle licenses that have a space in them (as of 2018-07-07 there are 67 packages in the Arch repos that have license containing a space). I did not fix this bug; I merely translated it from AWK to Python, as the program would also need to be adjusted elsewhere.