Postmortem 06.07.16
Issue
We had a brief outage this morning due to a disk space issue on our Sidekiq EC2 instance. One set of log files grew to
almost 20GB, which is strange because they are setup to be rotated using logrotate
. The files should have been limited to
700MB max. According to logrotate
status, it last ran on 6-1-16, when it should run daily via cron
.
Fix
logrotate
is now setup to run hourly, and we are going have monit monitor disk space on
all of our EC2 instances so we can catch this issue in the future.