Skip to content
Snippets Groups Projects
Verified Commit dfbbafa2 authored by Kristian Klausen's avatar Kristian Klausen :tada:
Browse files

archwiki: Do page view caching[1] with nginx for improved performance

We have used MediaWiki's file cache[2] until now, but recently the wiki
has been hammered with requests from some stupid Chinese bots/crawlers.

Caching at the web server level is faster as we avoid the PHP overhead
and it seems to make a difference (performance wise), especially when
the bots/crawlers are hitting us.

This is usual done with Varnish[3], but I went with a simple Python
service (30 LOC) for handling the PURGE requests as that is much simpler
thn adding Varnish to our stack.

[1] https://www.mediawiki.org/w/index.php?title=Manual:Performance_tuning&oldid=6670283#Page_view_caching
[2] https://www.mediawiki.org/wiki/Manual:File_cache
[3] https://www.mediawiki.org/wiki/Manual:Varnish_caching

Fix #315
parent c31633b9
No related branches found
No related tags found
1 merge request!846archwiki: Do page view caching[1] with nginx for improved performance
#!/usr/bin/env python
import hashlib
import http.server
import pathlib
import socketserver
import urllib.parse
socketserver.ThreadingTCPServer.allow_reuse_address = True
class Handler(http.server.BaseHTTPRequestHandler):
def do_PURGE(self):
self.send_response(http.HTTPStatus.OK)
self.end_headers()
o = urllib.parse.urlparse(self.path)
for method in ["GET", "HEAD"]:
# Please keep in sync with "fastcgi_cache_key" in nginx.d.conf.j2
if o.query:
cache_key = f"https{method}{o.netloc}{o.path}?{o.query}"
else:
cache_key = f"https{method}{o.netloc}{o.path}"
hash = hashlib.md5(cache_key.encode("utf-8")).hexdigest()
# Please keep in sync with "fastcgi_cache_path" in nginx.d.conf.j2
pathlib.Path(
f"/var/lib/nginx/cache/{hash[-1]}/{hash[-3:-1]}/{hash}"
).unlink(missing_ok=True)
httpd = http.server.ThreadingHTTPServer(("127.0.0.1", 1080), Handler)
httpd.serve_forever()
......@@ -97,16 +97,18 @@
- name: Start and enable memcached service
systemd: name=memcached@archwiki.service state=started enabled=true daemon_reload=true
- name: Install nginx-cache-purge script
copy: src=nginx-cache-purge dest=/usr/local/bin/nginx-cache-purge owner=root group=root mode=0755
- name: Install systemd services/timers
template: src="{{ item }}.j2" dest="/etc/systemd/system/{{ item }}" owner=root group=root mode=0644
loop:
- archwiki-runjobs.service
- archwiki-runjobs-wait.service
- archwiki-runjobs.timer
- archwiki-prune-cache.service
- archwiki-prune-cache.timer
- archwiki-question-updater.service
- archwiki-question-updater.timer
- nginx-cache-purge.service
- name: Start and enable archwiki timers and services
systemd:
......@@ -116,9 +118,9 @@
daemon_reload: true
with_items:
- archwiki-runjobs.timer
- archwiki-prune-cache.timer
- archwiki-runjobs-wait.service
- archwiki-question-updater.timer
- nginx-cache-purge.service
- name: Create question answer file
systemd:
......
......@@ -147,9 +147,9 @@ $wgMemCachedServers = [ "unix://{{ archwiki_memcached_socket }}" ];
## be publicly accessible from the web.
$wgCacheDirectory = "$IP/../cache/data";
$wgEnableSidebarCache = true;
$wgUseFileCache = true;
$wgFileCacheDirectory = "$IP/../cache/html";
$wgUseGzip = true;
$wgUseCdn = true;
$wgCdnServers = [ '127.0.0.1' ];
$wgInternalServer = 'http://wiki.archlinux.org';
# CSS-based preferences supposedly cause about 20 times slower page loads
# https://phabricator.wikimedia.org/rSVN63707
......
[Unit]
Description=Archwiki Prune Cache Service
[Service]
Type=oneshot
User={{ archwiki_user }}
WorkingDirectory={{ archwiki_dir }}
ExecStart=/usr/bin/php {{ archwiki_dir }}/public/maintenance/run.php pruneFileCache -q --agedays 1
NoNewPrivileges=yes
PrivateTmp=yes
PrivateDevices=yes
PrivateNetwork=true
ProtectSystem=full
ProtectHome=true
ProtectControlGroups=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
[Install]
WantedBy=multi-user.target
[Unit]
Description=Archwiki Prune Cache timer
[Timer]
OnCalendar=*-*-* 04:12:00
[Install]
WantedBy=timers.target
[Unit]
Description=nginx cache PURGE service
[Service]
User=http
ProtectSystem=strict
ReadWritePaths=/var/lib/nginx/cache
ExecStart=/usr/local/bin/nginx-cache-purge
[Install]
WantedBy=multi-user.target
fastcgi_cache_path /var/lib/nginx/cache levels=1:2 keys_zone=wiki:100m inactive=60m;
# Please keep "path" and "levels" in sync with nginx-cache-purge
fastcgi_cache_path /var/lib/nginx/cache levels=1:2 keys_zone=wiki:100m inactive=720m;
# Please keep in sync with "cache_key" in nginx-cache-purge
fastcgi_cache_key "$scheme$request_method$host$request_uri";
# rate limit API endpoint
......@@ -110,10 +112,18 @@ server {
fastcgi_index index.php;
include fastcgi.conf;
{% block wiki_cache %}
fastcgi_cache wiki;
fastcgi_cache_valid 200 10m;
# This improves the cache hit ratio[1] and ensures that there is
# only a single cache file. Without this, nginx will use the
# Vary header as an secondary cache key[2], which breaks the
# cache purge servce.
# [1] https://www.fastly.com/blog/best-practices-using-vary-header/
# [2] https://github.com/nginx/nginx/commit/1332e76b20a6a1e871904525d42b17dcaed81eec
fastcgi_ignore_headers Vary;
add_header X-Cache $upstream_cache_status;
{% endblock %}
}
# mediawiki API endpoint
......@@ -141,6 +151,11 @@ server {
fastcgi_index index.php;
include fastcgi.conf;
{{ self.wiki_cache() }}
# https://www.mediawiki.org/w/index.php?title=Manual:Varnish_caching&oldid=6230975#Configuring_Varnish
fastcgi_cache_bypass $http_authorization $cookie_archwiki_session $cookie_archwikiToken;
fastcgi_no_cache $http_authorization $cookie_archwiki_session $cookie_archwikiToken;
limit_req zone=archwikilimit burst=10 nodelay;
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment