What's New
New updates and improvements to Hatchbox.io
← BackSeptember 18th, 2025 Incident
Update
On September 18th, 2025, Hatchbox was unavailable for several hours.
What happened
We discovered the disk for the Hatchbox server was full. This was due to some heavy usage overnight writing a significant amount of logs from deployments.
We also saw errors that the database host was inaccessible. This could have been related to the disk being full, but seemed unrelated. The database was connecting to the same server over it's private IP address and was failing to resolve the IP. It's possible this was caused by network maintenance in the datacenter overnight.
Our uptime monitoring checked for a 200 OK response from https://app.hatchbox.io. However, the homepage doesn't query the database meaning the uptime check was successful even during the outage.
We also saw errors that the database host was inaccessible. This could have been related to the disk being full, but seemed unrelated. The database was connecting to the same server over it's private IP address and was failing to resolve the IP. It's possible this was caused by network maintenance in the datacenter overnight.
Our uptime monitoring checked for a 200 OK response from https://app.hatchbox.io. However, the homepage doesn't query the database meaning the uptime check was successful even during the outage.
Going forward
To restore free disk space, we cleared several caches on the server and older deployment logs from Hatchbox applications. This included Rubygems, bootsnap, APT, Journalctl, and others to free up some disk space.
However, to fully free up the disk space, we needed to vacuum the Postgres database. Since vacuum full requires twice the amount of disk space, we mounted a volume, moved the tablespace, vacuumed the table, and restored the tablespace back to the server. This freed up over 50% of disk space on the server. We also vacuumed some other tables and started cleaning up PGHero stats every 14 days to reduce disk usage for it.
To fix the database IP address issue, a restart of the server resolved the issue. However, the Hatchbox database is running locally and the database host was misconfigured to use the server's private IP address instead of localhost. This introduces a dependency on the private network that is not necessary so change this should be more reliable and faster.
We have also modified our uptime check to hit a page that accesses the database to ensure the application is running fully.
Reliability is a huge priority for us to ensure that you have a good experience with Hatchbox, so we'll be improving our own infrastructure to ensure these types of problems don't happen again. We'll also be taking these lessons and applying them to Hatchbox to protect your own apps and servers from the same issues.
However, to fully free up the disk space, we needed to vacuum the Postgres database. Since vacuum full requires twice the amount of disk space, we mounted a volume, moved the tablespace, vacuumed the table, and restored the tablespace back to the server. This freed up over 50% of disk space on the server. We also vacuumed some other tables and started cleaning up PGHero stats every 14 days to reduce disk usage for it.
To fix the database IP address issue, a restart of the server resolved the issue. However, the Hatchbox database is running locally and the database host was misconfigured to use the server's private IP address instead of localhost. This introduces a dependency on the private network that is not necessary so change this should be more reliable and faster.
We have also modified our uptime check to hit a page that accesses the database to ensure the application is running fully.
Reliability is a huge priority for us to ensure that you have a good experience with Hatchbox, so we'll be improving our own infrastructure to ensure these types of problems don't happen again. We'll also be taking these lessons and applying them to Hatchbox to protect your own apps and servers from the same issues.
We know these things create tough times for everyone and appreciated your understanding and patience while we got this resolved. Thanks as always for using Hatchbox and we apologize for any headache this may have caused.