09 Nov 2024

Power Outages and Gunicorn PID files

This week, my neighborhood had a few power outages. Bad news for the uptime of my self-hosted sites! 😅

After the power came back and my server started, Gunicorn services failed with the following error:

[2024-11-08 19:13:18 +0000] [1806] [INFO] Starting gunicorn 23.0.0
Error: Already running on PID 1401 (or pid file 'gunicorn.pid' is stale)
Main process exited, code=exited, status=1/FAILURE
Failed with result 'exit-code'.

Gunicorn usually deletes the PID file when it shuts down. But if the power goes out, the clean-up code does not have a chance to run. Because I stored the PID files in my $HOME directory, they didn’t get cleaned up after reboot, causing the error above.

Deleting PID files on reboot

I started looking for a solution to deleting the PID files on reboot and found posts about the /var/run folder. var/run is usually mounted as a tmpfs, so all its contents are deleted when your server shuts down.

I peeked into the folder and saw PID files for nginx, sshd, and others, so I thought I was on the right track!

But I ran into an issue when I tried to create my Gunicorn PID files inside /var/run. Only the root user has write permission, and I run all my Gunicorn processes as an unprivileged user. 🤦‍♂️

A bit more Googling and I came across /var/run/user/$UID, but this SO answer convinced me that storing my PID files there is not a good idea.

`/tmp`

I also considered storing the PID files inside /tmp, but from what I can tell, different distros have different policies on when the /tmp folder is cleared. It could be on reboot, but it could also be based on the file’s age, and I didn’t want my PID file to disappear after a few days!

Taking a step back

I decided to take a step back and think about why I needed the PID files in the first place.

The sole reason was that my update script uses it to send the -hup signal to the Gunicorn process (it’s how I do No Downtime Deployments). But reading the PID file isn’t the only way to find Gunicorn’s PID!

Another way is to grep through ps axf command output. The solution I ended up using is this bash one-liner:

ps axf | grep 'gunicorn: master \[fedidevs\]' | awk '{print "kill -hup " $1}' | sh

The one-liner has a face only a mother could love, but it’s better than dealing with services that do not start on boot because of a stale PID file, so this is my solution for now!

SystemD

If you are using SystemD to start your gunicorn process, then you can use the $MAINPID environment variable set in the ExecReload and ExecStop commands of your .service file. Here’s a full example for my fedidevs site:

[Unit]
Description=Fedidevs
After=postgresql.service
After=nginx.service
After=redis.service

[Service]
User=www-data
Group=www-data
WorkingDirectory=/var/apps/fedidevs
ExecStart=/var/apps/fedidevs/.venv/bin/gunicorn fedidevs.wsgi
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MANPID
Type=simple
Restart=always
RestartSec=1

[Install]
WantedBy=multi-user.target

Shoutout to Rémy on BlueSky for letting me know about $MAINPID and sharing his .service file template!

Is there a better way?

Probably, but I was not able to figure it out. Please let me know if you know a better way to do this!

P.S.: LLMs are useless when answering questions about PID files. ChatGPT even thought I would want to persist PID files after a reboot:

Tablet screenshot

Anže's Blog

Python, Django, and the Web