diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000000000000000000000000000000000000..fb394de7276b15d1d9fabdb57e46e82eafce6650
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "archlinux-common-style"]
+ path = archlinux-common-style
+ url = https://gitlab.archlinux.org/archlinux/archlinux-common-style.git
diff --git a/README.md b/README.md
index d8d4448ea1d6c6c61c6de211e3dc5c05e0b14d5f..5ad15bdabb4ee015242456885ef886f4aab4a1b6 100644
--- a/README.md
+++ b/README.md
@@ -6,11 +6,11 @@
## Installation
-1. In the directory `mysite` copy `local_settings.py.example` to `local_settings.py` and edit `DEBUG = True` and the `SECRET_KEY` variable.
+1. Copy `local_settings.py.example` to `local_settings.py` and edit `DEBUG = True` and the `SECRET_KEY` variable.
2. Configure a connection to a [PostgreSQL](https://wiki.archlinux.org/index.php/PostgreSQL) database
- in the [Django database settings](https://docs.djangoproject.com/en/1.11/ref/settings/#databases)
- in the `mysite/local_settings.py` file.
+ in the [Django database settings](https://docs.djangoproject.com/en/3.1/ref/settings/#databases)
+ in the `local_settings.py` file.
3. Make sure that the [pg_trgm](https://www.postgresql.org/docs/current/pgtrgm.html)
extension is [created](https://www.postgresql.org/docs/current/sql-createextension.html)
@@ -38,3 +38,161 @@
for the development, you can run e.g. `update.py --only-repos core` to import
only man pages from the core repository (the smallest one, download size is
about 160 MiB) or even `update.py --only-packages coreutils man-pages`.
+
+## About
+
+This website was created for the [man template](https://wiki.archlinux.org/index.php/Template:Man)
+on the Arch wiki. Originally, the template replaced plain text, unclickable
+references to man pages with links to [man7.org](https://man7.org/linux/man-pages/),
+which contains a handful of manuals taken directly from upstream. Later, we
+considered switching to another site providing more manuals. Since we did not
+find a suitable external site, we decided to build a new service to satisfy all
+our requirements:
+
+1. All man pages from official Arch packages are available. Old versions and
+ permalinks are not necessary.
+2. Functionality does not require Javascript.
+3. Pages are addressable by their name and section, both occurring exactly once
+ in the URL to avoid problems with pages such as
+ [ar(1)](https://jlk.fjfi.cvut.cz/arch/manpages/man/ar.1) and
+ [ar(1p)](https://jlk.fjfi.cvut.cz/arch/manpages/man/ar.1p).
+4. The URLs used by the _man_ template should not redirect to permalinks,
+ otherwise users would start copy-pasting them to the wiki and it would be
+ hard to check if they are the same as the canonical URLs.
+5. Human-readable subsection anchors.
+6. The page should clearly indicate the Arch package version containing the
+ page.
+
+See the [original discussion](https://wiki.archlinux.org/index.php/Template_talk:Man#Sources)
+for details.
+
+We used a dynamic approach instead of building a website consisting of
+completely static pages. The main building blocks are the
+[Django web framework](https://www.djangoproject.com/), the
+[PostgreSQL](https://www.postgresql.org/) database server, the `mandoc` tool
+from the [mandoc toolset](http://mdocml.bsd.lv/) for the conversion to HTML and
+the [pyalpm](https://github.com/archlinux/pyalpm) library for data extraction
+from the Arch repositories. The code is available in the
+[archmanweb](https://gitlab.archlinux.org/archlinux/archmanweb) repository at
+GitHub.
+
+Overall, this approach allows us to provide the following features without
+rebuilding the whole website from scratch:
+
+- Listings with custom filters and orderings.
+- Links to other versions of the same manual provided by different packages.
+- Links to similar manuals available in other sections or languages.
+- Searching in the names and descriptions of packages and manuals, similarly to
+ [apropos(1)](https://jlk.fjfi.cvut.cz/arch/manpages/about).
+
+### Similar projects
+
+Some similar projects, each using a different approach, are:
+
+- [manned.org](https://manned.org/) ([code](https://g.blicky.net/manned.git/),
+ [Arch BBS thread](https://bbs.archlinux.org/viewtopic.php?id=145382))
+- [man7.org](http://man7.org/linux/man-pages/) (no idea about website scripts)
+- [manpages.debian.org](https://manpages.debian.org/)
+ ([source](https://github.com/Debian/debiman/))
+- [man.openbsd.org](http://man.openbsd.org/) (runs with the mandoc CGI script)
+
+## Test cases
+
+These links serve as test cases to ensure that all features still work, they
+are not useful to regular users.
+
+### URLs with dots
+
+- intro
+- intro.1
+- intro.1.en
+- intro.en
+- systemd.service
+- systemd.service.5
+- systemd.service.5.en
+- systemd.service.en
+- gimp-2.8
+- gimp-2.8.1
+- gimp-2.8.1.en
+- gimp-2.8.en
+- CA.pl
+- CA.pl.1ssl
+- CA.pl.1ssl.en
+- CA.pl.en
+
+### Best match lookup
+
+Ambiguous cases are ordered by section, package repository and package version,
+then the first manual is selected.
+
+- mount redirects to
+ mount.8
+ (not mount.2)
+- gv redirects to
+ gv.1
+ (not gv.3guile,
+ gv.3lua etc.)
+- graphviz/gv redirects to
+ graphviz/gv.3guile
+ (not graphviz/gv.3lua etc.)
+- gv.3 redirects to
+ gv.3guile
+ (not gv.1,
+ gv.3lua etc.)
+- aliases.5 displays
+ extra/postfix/aliases.5
+ (not community/opensmtpd/aliases.5)
+- mysqld.8 displays
+ extra/mariadb/mysqld.8
+ (not community/percona-server/mysqld.8)
+- mailx and
+ mailx.1 redirect to
+ mail.1.en as a symbolic link
+ (not mailx.1p)
+
+### Language fallback
+
+- nvidia-smi.cs →
+ nvidia-smi.en →
+ nvidia-smi.1.en
+ (maybe we should try harder and avoid the double redirect)
+- nvidia-smi.1.cs →
+ nvidia-smi.1.en
+- nvidia-smi.foo → 404
+- nvidia-smi.1.foo → 404
+
+### Package filter
+
+- nvidia-utils/nvidia-smi.en
+- nvidia-340xx-utils/nvidia-smi.en
+- nvidia-utils/nvidia-smi.cs →
+ nvidia-utils/nvidia-smi.en
+- nvidia-340xx-utils/nvidia-smi.cs →
+ nvidia-utils/nvidia-340xx-smi.en
+- foo/nvidia-smi.cs → 404
+- foo/nvidia-smi.en → 404
+
+### .so macros
+
+There is a groff(1) extension for the
+man(7) and
+mdoc(7)
+languages to include contents of other files using the `.so` macro. In normal
+operation where manuals are stored as files on a file system, the
+soelim(1)
+pre-processor handles the inclusion. Our system is based on a database rather
+than a file system, so we need a custom `soelim` as well.
+
+Some pages which contain the `.so` macro:
+
+- [.1.zh_CN
+- pwunconv(8)
+- pam(8)
+- url(7)
+- xorg.conf.d(5)
+- glibc(7)
+- systemd-logind(8)
+- shorewall6.conf(5)
+ points to a page contained in a different package (`shorewall` instead of `shorewall6`)
+- lsof(8)
+ (not a "hardlink", includes an invalid file `./00DIALECTS`)
diff --git a/archlinux-common-style b/archlinux-common-style
new file mode 160000
index 0000000000000000000000000000000000000000..fe41472481348017b99e31f205235cdcaa0d556f
--- /dev/null
+++ b/archlinux-common-style
@@ -0,0 +1 @@
+Subproject commit fe41472481348017b99e31f205235cdcaa0d556f
diff --git a/archmanweb/management/commands/man_drop_cache.py b/archmanweb/management/commands/man_drop_cache.py
new file mode 100755
index 0000000000000000000000000000000000000000..bc4c68a0b7fb6798811af3d96aedc1e70bc39f8c
--- /dev/null
+++ b/archmanweb/management/commands/man_drop_cache.py
@@ -0,0 +1,15 @@
+#! /usr/bin/env python3
+
+from django.core.management.base import BaseCommand
+from django.db import connection
+
+class Command(BaseCommand):
+ help = "Drops cached data from the database"
+
+ def handle(self, *args, **kwargs):
+ with connection.cursor() as c:
+ c.execute("UPDATE archmanweb_content SET html = NULL WHERE html IS NOT NULL;")
+ c.execute("UPDATE archmanweb_content SET txt = NULL WHERE txt IS NOT NULL;")
+ c.execute("UPDATE archmanweb_content SET description = NULL WHERE description IS NOT NULL;")
+ c.execute("UPDATE archmanweb_manpage SET converted_content_id = NULL WHERE converted_content_id IS NOT NULL;")
+ c.execute("VACUUM FULL archmanweb_content;")
diff --git a/update.py b/archmanweb/management/commands/man_update.py
similarity index 68%
rename from update.py
rename to archmanweb/management/commands/man_update.py
index 11dc4bc65073805df9c6145054f6e992198f2a8d..b4947a50b3a497e032db5b9da3c9ff46c70cbdc9 100755
--- a/update.py
+++ b/archmanweb/management/commands/man_update.py
@@ -10,12 +10,10 @@ import subprocess
import chardet
import pyalpm
-from finder import MANDIR, ManPagesFinder
+from archmanweb.management.utils.finder import MANDIR, ManPagesFinder
-# init django
-os.environ.setdefault("DJANGO_SETTINGS_MODULE", "mysite.settings")
+from django.core.management.base import BaseCommand
import django
-django.setup()
from django.db import connection, transaction
from django.db.models import Count
from archmanweb.models import Package, Content, ManPage, SymbolicLink, UpdateLog, SoelimError
@@ -313,93 +311,113 @@ def update_man_pages(finder, updated_pkgs):
return updated_pages
-if __name__ == "__main__":
- # init logging
- logger = logging.getLogger()
- logger.setLevel(logging.INFO)
- handler = logging.StreamHandler()
- formatter = logging.Formatter("{levelname:8} {message}", style="{")
- handler.setFormatter(formatter)
- logger.addHandler(handler)
-
- parser = argparse.ArgumentParser(description="update man pages in the django database")
- parser.add_argument("--force", action="store_true",
- help="force an import of man pages from all packages, even if they were not updated recently")
- parser.add_argument("--only-repos", action="store", nargs="+", metavar="NAME",
- help="import packages (and man pages) only from these repositories")
- parser.add_argument("--only-packages", action="store", nargs="+", metavar="NAME",
- help="import man pages only from these packages")
- parser.add_argument("--cache-dir", action="store", default="./.cache/",
- help="path to the cache directory (default: %(default)s)")
- parser.add_argument("--keep-tarballs", action="store_true",
- help="keep downloaded package tarballs in the cache directory")
- parser.add_argument("--workers", type=int, default=0,
- help="number of workers for parallel processing (0 = use 1 worker per CPU core)")
- args = parser.parse_args()
-
- start = datetime.datetime.now(tz=datetime.timezone.utc)
-
- finder = ManPagesFinder(args.cache_dir)
- finder.refresh()
-
- # everything in a single transaction
- with transaction.atomic():
- updated_pkgs = update_packages(finder, force=args.force, only_repos=args.only_repos)
- if args.only_packages is None:
- count_updated_pages = update_man_pages(finder, updated_pkgs)
- else:
- count_updated_pages = update_man_pages(finder, [p for p in updated_pkgs if p.name in args.only_packages])
-
- # this is called outside of the transaction, so that the cache can be reused on errors
- if args.keep_tarballs is False:
- finder.clear_pkgcache()
-
- # convert manual pages to plain-text
- # (one transaction per update, otherwise we might hit memory allocation error)
- def worker(man_id):
- man = ManPage.objects.get(id=man_id)
- try:
- man.get_converted("txt")
- except SoelimError as e:
- logger.error("SoelimError ({}) while converting {}.{}.{} to txt".format(str(e), man.name, man.section, man.lang))
- except subprocess.CalledProcessError as e:
- logger.error("CalledProcessError while converting {}.{}.{} to txt:\nreturncode = {}\nstderr = {}"
- .format(man.name, man.section, man.lang, e.returncode, e.stderr))
-
- # prepare man page IDs which need to be converted
- # (queryset needs to be a list for multiprocessing to work)
- queryset = ManPage.objects.only("package", "lang", "content_id", "converted_content_id").filter(content__txt=None).values_list("id", flat=True)
- queryset = list(queryset)
-
- # all existing database connections have to be closed before forking,
- # each process will then recreate its own connection:
- # https://stackoverflow.com/a/10684672
- django.db.connections.close_all()
-
- # parallel processing of the queryset
- import concurrent.futures
- with concurrent.futures.ProcessPoolExecutor(max_workers=args.workers or None) as executor:
- executor.map(worker, queryset)
-
- # VACUUM cannot run inside a transaction block
- if updated_pkgs or args.only_packages is not None:
- logger.info("Running VACUUM FULL ANALYZE on our tables...")
- for Model in [Package, Content, ManPage, SymbolicLink]:
- table = Model.objects.model._meta.db_table
- logger.info("--> {}".format(table))
- with connection.cursor() as cursor:
- cursor.execute("VACUUM FULL ANALYZE {};".format(table))
-
- end = datetime.datetime.now(tz=datetime.timezone.utc)
-
- # log update
- log = UpdateLog()
- log.timestamp = start
- log.duration = end - start
- log.updated_pkgs = len(updated_pkgs)
- log.updated_pages = count_updated_pages
- log.stats_count_man_pages = ManPage.objects.count()
- log.stats_count_symlinks = SymbolicLink.objects.count()
- log.stats_count_all_pkgs = Package.objects.count()
- log.stats_count_pkgs_with_mans = ManPage.objects.aggregate(Count("package_id", distinct=True))["package_id__count"]
- log.save()
+class Command(BaseCommand):
+ help = "Update man pages in the Django database"
+
+ def __init__(self, *args, **kwargs):
+ BaseCommand.__init__(self, *args, **kwargs)
+
+ # TODO: use Django settings to configure the logger
+ # https://docs.djangoproject.com/en/3.1/topics/logging/
+ logger = logging.getLogger()
+ logger.setLevel(logging.INFO)
+ handler = logging.StreamHandler()
+ formatter = logging.Formatter("{levelname:8} {message}", style="{")
+ handler.setFormatter(formatter)
+ logger.addHandler(handler)
+
+ def add_arguments(self, parser):
+ """
+ :param parser: an instance of :py:class:`argparse.ArgumentParser`
+ """
+ parser.add_argument("--force", action="store_true",
+ help="force an import of man pages from all packages, even if they were not updated recently")
+ parser.add_argument("--only-repos", action="store", nargs="+", metavar="NAME",
+ help="import packages (and man pages) only from these repositories")
+ parser.add_argument("--only-packages", action="store", nargs="+", metavar="NAME",
+ help="import man pages only from these packages")
+ parser.add_argument("--cache-dir", action="store", default="./.cache/",
+ help="path to the cache directory (default: %(default)s)")
+ parser.add_argument("--keep-tarballs", action="store_true",
+ help="keep downloaded package tarballs in the cache directory")
+ parser.add_argument("--workers", type=int, default=0,
+ help="number of workers for parallel processing (0 = use 1 worker per CPU core; default: %(default)s)")
+
+ def handle(self, **kwargs):
+ start = datetime.datetime.now(tz=datetime.timezone.utc)
+ updated_pkgs, count_updated_pages = self.do_update(**kwargs)
+ end = datetime.datetime.now(tz=datetime.timezone.utc)
+
+ # log update
+ log = UpdateLog()
+ log.timestamp = start
+ log.duration = end - start
+ log.updated_pkgs = len(updated_pkgs)
+ log.updated_pages = count_updated_pages
+ log.stats_count_man_pages = ManPage.objects.count()
+ log.stats_count_symlinks = SymbolicLink.objects.count()
+ log.stats_count_all_pkgs = Package.objects.count()
+ log.stats_count_pkgs_with_mans = ManPage.objects.aggregate(Count("package_id", distinct=True))["package_id__count"]
+ log.save()
+
+ def do_update(self, *, cache_dir, workers,
+ force=False,
+ only_repos=None,
+ only_packages=None,
+ keep_tarballs=False,
+ **kwargs):
+ finder = ManPagesFinder(cache_dir)
+ finder.refresh()
+
+ # everything in a single transaction
+ with transaction.atomic():
+ updated_pkgs = update_packages(finder, force=force, only_repos=only_repos)
+ if only_packages is None:
+ count_updated_pages = update_man_pages(finder, updated_pkgs)
+ else:
+ count_updated_pages = update_man_pages(finder, [p for p in updated_pkgs if p.name in only_packages])
+
+ # this is called outside of the transaction, so that the cache can be reused on errors
+ if keep_tarballs is False:
+ finder.clear_pkgcache()
+
+ # convert manual pages to plain-text
+ # (one transaction per update, otherwise we might hit memory allocation error)
+ def worker(man_id):
+ man = ManPage.objects.get(id=man_id)
+ try:
+ man.get_converted("txt")
+ except SoelimError as e:
+ logger.error("SoelimError ({}) while converting {}.{}.{} to txt".format(str(e), man.name, man.section, man.lang))
+ except subprocess.CalledProcessError as e:
+ logger.error("CalledProcessError while converting {}.{}.{} to txt:\nreturncode = {}\nstderr = {}"
+ .format(man.name, man.section, man.lang, e.returncode, e.stderr))
+
+ # prepare man page IDs which need to be converted
+ # (queryset needs to be a list for multiprocessing to work)
+ queryset = ManPage.objects.only("package", "lang", "content_id", "converted_content_id").filter(content__txt=None).values_list("id", flat=True)
+ queryset = list(queryset)
+
+ # all existing database connections have to be closed before forking,
+ # each process will then recreate its own connection:
+ # https://stackoverflow.com/a/10684672
+ django.db.connections.close_all()
+
+ # parallel processing of the queryset
+ import concurrent.futures
+ # FIXME: Why the fuck does it deadlock here, after we moved the code into the Command class?
+ # Database connections are closed just above, which used to work before...
+ #with concurrent.futures.ProcessPoolExecutor(max_workers=workers or None) as executor:
+ with concurrent.futures.ThreadPoolExecutor(max_workers=workers or None) as executor:
+ executor.map(worker, queryset)
+
+ # VACUUM cannot run inside a transaction block
+ if updated_pkgs or only_packages is not None:
+ logger.info("Running VACUUM FULL ANALYZE on our tables...")
+ for Model in [Package, Content, ManPage, SymbolicLink]:
+ table = Model.objects.model._meta.db_table
+ logger.info("--> {}".format(table))
+ with connection.cursor() as cursor:
+ cursor.execute("VACUUM FULL ANALYZE {};".format(table))
+
+ return updated_pkgs, count_updated_pages
diff --git a/finder.py b/archmanweb/management/utils/finder.py
similarity index 97%
rename from finder.py
rename to archmanweb/management/utils/finder.py
index 9e410053959007c523d9b3d5db654eaa851e62a9..e0c4cab5fb6278bdf0d07723954e49bd2d826e2f 100644
--- a/finder.py
+++ b/archmanweb/management/utils/finder.py
@@ -188,9 +188,9 @@ class ManPagesFinder:
# get the pkg tarball
_pattern = "{}-{}-{}.pkg.tar.*".format(pkg.name, pkg.version, pkg.arch)
- if not list(self.cachedir.glob(_pattern)):
+ if not list(f for f in self.cachedir.glob(_pattern) if not str(f).endswith(".part")):
self._download_package(pkg)
- tarballs = sorted(self.cachedir.glob(_pattern))
+ tarballs = sorted(f for f in self.cachedir.glob(_pattern) if not str(f).endswith(".part"))
assert len(tarballs) > 0, _pattern
tarball = tarballs[0]
diff --git a/archmanweb/static/archmanweb/base.css b/archmanweb/static/archmanweb/base.css
index fcec57a4146fdd0add7628f1da9948716d69e914..4f1eb1d6f39b85b0ccdd1ad672db65ef6fadd9da 100644
--- a/archmanweb/static/archmanweb/base.css
+++ b/archmanweb/static/archmanweb/base.css
@@ -1,48 +1,8 @@
-header {
- padding: 10px 15px;
- background: #333;
- border-bottom: 5px #08c solid;
- color: white !important;
- display: flex;
- justify-content: space-between;
- align-items: center;
- font-family: sans-serif;
-}
-header > h1 {
- font-size: 1.25em;
- margin: 0;
-}
-header > nav {
- margin-bottom: 0;
-}
-header > nav > a {
- font-weight: bold;
- font-size: 1em;
- text-decoration: none;
- color: #999 !important;
-}
-header > nav > a:focus,
-header > nav > a:hover {
- text-decoration: underline;
- color: white !important;
-}
-header > nav > a.selected {
- color: white !important;
-}
-header > nav > a:not(:last-child) {
- margin-right: 1em;
-}
-
-/* responsive header */
-@media only screen and (max-width: 650px) {
- header {
- /* place the items in vertical direction */
- flex-direction: column;
- }
-
- header > :not(:last-child) {
- margin: 0 0 0.5rem;
- }
+#archnavbar form {
+ display: inline-block !important;
+ font-size: 14px !important;
+ line-height: 14px !important;
+ padding: 14px 15px 0px !important;
}
/* simple reset */
diff --git a/archmanweb/templates/about.html b/archmanweb/templates/about.html
deleted file mode 100644
index 6c6d2491254b245a393aa244b17fefa9e9d876e8..0000000000000000000000000000000000000000
--- a/archmanweb/templates/about.html
+++ /dev/null
@@ -1,69 +0,0 @@
-{% extends "index.html" %}
-
-{% block content %}
-
-
-
About
-
-
- This website was created for the man template
- on the Arch wiki. Originally, the template replaced plain text, unclickable references to man pages with links
- to man7.org, which contains a handful of manuals taken directly
- from upstream. Later, we considered switching to another site providing more manuals. Since we did not find
- a suitable external site, we decided to build a new service to satisfy all our requirements:
-
-
-
All man pages from official Arch packages are available. Old versions and permalinks are not necessary.
-
Functionality does not require Javascript.
-
Pages are addressable by their name and section, both occurring exactly once in the URL to avoid
- problems with pages such as ar(1) and
- ar(1p).
-
-
The URLs used by the man template should not redirect to permalinks, otherwise users would start
- copy-pasting them to the wiki and it would be hard to check if they are the same as the canonical URLs.
-
-
Human-readable subsection anchors.
-
The page should clearly indicate the Arch package version containing the page.
- We used a dynamic approach instead of building a website consisting of completely static pages. The main
- building blocks are the Django web framework, the
- PostgreSQL database server, the mandoc tool from
- the mandoc toolset for the conversion to HTML and the
- pyalpm library for data extraction from the Arch repositories.
- The code is available in the lahwaacz/archmanweb
- repository at GitHub.
-
-
-
- Overall, this approach allows us to provide the following features without rebuilding the whole website
- from scratch:
-
-
-
Listings with custom filters and orderings.
-
Links to other versions of the same manual provided by different packages.
-
Links to similar manuals available in other sections or languages.
-
Searching in the names and descriptions of packages and manuals, similarly to
- apropos(1).
-
-
-
Similar projects
-
-
- Some similar projects, each using a different approach, are:
-