Power Outages and Gunicorn PID files
This week, my neighborhood had a few power outages. Bad news for the uptime of my self-hosted sites! 😅
After the power came back and my server started, Gunicorn services failed with the following error:
[2024-11-08 19:13:18 +0000] [1806] [INFO] Starting gunicorn 23.0.0
Error: Already running on PID 1401 (or pid file 'gunicorn.pid' is stale)
Main process exited, code=exited, status=1/FAILURE
Failed with result 'exit-code'.
Gunicorn usually deletes the PID file when it shuts down. But if the power goes out, the clean-up code does not have a chance to run. Because I stored the PID files in my $HOME
directory, they didn’t get cleaned up after reboot, causing the error above.
Deleting PID files on reboot
I started looking for a solution to deleting the PID files on reboot and found posts about the /var/run
folder. var/run
is usually mounted as a tmpfs
, so all its contents are deleted when your server shuts down.
I peeked into the folder and saw PID files for nginx
, sshd
, and others, so I thought I was on the right track!
But I ran into an issue when I tried to create my Gunicorn PID files inside /var/run
. Only the root
user has write permission, and I run all my Gunicorn processes as an unprivileged user. 🤦♂️
A bit more Googling and I came across /var/run/user/$UID
, but this SO answer convinced me that storing my PID files there is not a good idea.
/tmp
I also considered storing the PID files inside /tmp
, but from what I can tell, different distros have different policies on when the /tmp
folder is cleared. It could be on reboot, but it could also be based on the file’s age, and I didn’t want my PID file to disappear after a few days!
Taking a step back
I decided to take a step back and think about why I needed the PID files in the first place.
The sole reason was that my update script uses it to send the -hup
signal to the Gunicorn process (it’s how I do No Downtime Deployments). But reading the PID file isn’t the only way to find Gunicorn’s PID!
Another way is to grep through ps axf
command output. The solution I ended up using is this bash one-liner:
ps axf | grep 'gunicorn: master \[fedidevs\]' | awk '{print "kill -hup " $1}' | sh
The one-liner has a face only a mother could love, but it’s better than dealing with services that do not start on boot because of a stale PID file, so this is my solution for now!
SystemD
If you are using SystemD to start your gunicorn
process, then you can use the $MAINPID
environment variable set in the ExecReload
and ExecStop
commands of your .service
file. Here’s a full example for my fedidevs site:
[Unit]
Description=Fedidevs
After=postgresql.service
After=nginx.service
After=redis.service
[Service]
User=www-data
Group=www-data
WorkingDirectory=/var/apps/fedidevs
ExecStart=/var/apps/fedidevs/.venv/bin/gunicorn fedidevs.wsgi
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MANPID
Type=simple
Restart=always
RestartSec=1
[Install]
WantedBy=multi-user.target
Shoutout to Rémy on BlueSky for letting me know about $MAINPID and sharing his .service
file template!
Is there a better way?
Probably, but I was not able to figure it out. Please let me know if you know a better way to do this!
P.S.: LLMs are useless when answering questions about PID files. ChatGPT even thought I would want to persist PID files after a reboot: