Speed up process of generating metadata archives
-
I have summed up the feature in concise words in the Summary section. -
I have completely described the feature in the Description section. -
I have completed the Blockers section.
Summary
Use mysql/mariadb built-in json functions to compose metadata information.
Description
The metadata archive could be generated using pure SQL. This approach might be faster than the current python script.
To be tested.
Example:
The following SQL statement creates a view that should produce the same output as mkpkglists for the packages-meta-ext-v1
archive
create view vPackageInfo as
select '' Name, '[' pkginfo, 1 sort
union
select p.Name, concat(json_object(
'ID', p.ID,
'Name', p.Name,
'PackageBaseID', pb.ID,
'PackageBase', pb.Name,
'Version', p.Version,
'Description', p.Description,
'URL', p.URL,
'NumVotes', cast((select count(1) from PackageVotes pv where pv.PackageBaseID = p.PackageBaseID) as int),
'Popularity', round(pb.Popularity, 2),
'OutOfDate', pb.OutOfDateTS,
'Maintainer', mu.Username,
'FirstSubmitted', pb.SubmittedTS,
'LastModified', pb.ModifiedTS,
'URLPath', concat('/cgit/aur.git/snapshot/', pb.Name, '.tar.gz'),
'Depends', (select json_arrayagg(concat(d.DepName, ifnull(d.DepCondition,''))) from PackageDepends d where d.PackageID = p.ID and d.DepTypeID = 1),
'MakeDepends', (select json_arrayagg(concat(md.DepName, ifnull(md.DepCondition,''))) from PackageDepends md where md.PackageID = p.ID and md.DepTypeID = 2),
'CheckDepends', (select json_arrayagg(concat(cd.DepName, ifnull(cd.DepCondition,''))) from PackageDepends cd where cd.PackageID = p.ID and cd.DepTypeID = 3),
'OptDepends', (select json_arrayagg(concat(od.DepName, ifnull(od.DepCondition,''))) from PackageDepends od where od.PackageID = p.ID and od.DepTypeID = 4),
'Conflicts', (select json_arrayagg(concat(rc.RelName, ifnull(rc.RelCondition,''))) from PackageRelations rc where rc.PackageID = p.ID and rc.RelTypeID = 1),
'Provides', (select json_arrayagg(concat(rp.RelName, ifnull(rp.RelCondition,''))) from PackageRelations rp where rp.PackageID = p.ID and rp.RelTypeID = 2),
'Replaces', (select json_arrayagg(concat(rr.RelName, ifnull(rr.RelCondition,''))) from PackageRelations rr where rr.PackageID = p.ID and rr.RelTypeID = 3),
'Groups', (select json_arrayagg(g.Name) from Groups g join PackageGroups pg on g.ID = pg.GroupID where pg.PackageID = p.ID),
'License', (select json_arrayagg(l.Name) from Licenses l join PackageLicenses pl on l.ID = pl.LicenseID where pl.PackageID = p.ID),
'Keywords', (select json_arrayagg(pk.Keyword) from PackageKeywords pk where pk.PackageBaseID = p.PackageBaseID)
), case when p.Name = (select max(Name) from Packages pm) then '' else ',' end) pkginfo, 2
from Packages p
join PackageBases pb on pb.ID = p.PackageBaseID
left join Users mu on mu.ID = pb.MaintainerUID
where pb.PackagerUID is not null
union select '', ']', 3
Data can then be dumped to a file with
mysql -N -r -e 'select pkginfo from vPackageInfo order by sort, Name' aurweb > meta.json
One other thing I've noticed during testing is that creating the following two indexes improved query time from 4.7s to 3.3s:
CREATE INDEX PackageID_DepTypeID USING BTREE ON aurweb.PackageDepends (PackageID,DepTypeID);
CREATE INDEX PackageID_RelTypeID USING BTREE ON aurweb.PackageRelations (PackageID,RelTypeID);
Maybe the current process will also benefit from those...