Postmortem 06.07.16

Issue

We had a brief outage this morning due to a disk space issue on our Sidekiq EC2 instance. One set of log files grew to almost 20GB, which is strange because they are setup to be rotated using logrotate. The files should have been limited to 700MB max. According to logrotate status, it last ran on 6-1-16, when it should run daily via cron.

Fix

logrotate is now setup to run hourly, and we are going have monit monitor disk space on all of our EC2 instances so we can catch this issue in the future.