It's always great to see our stuff being used in action, solving real problems. Today, we've got a "real-world" story to tell about Dawn™.

One of the first sites we've been using Dawn for is a Wellington-based company that has about 120,000 customers, and whose primary revenue source is through its website. We host and support the site, and we use Dawn to monitor it.

In July, about a week after we started using Dawn for the site, we noticed a load spike that should have been manageable but instead caused the server to run out of memory.

We compared the Dawn graphs for memory usage and active apache processes. This helped us determine that the problem was due to the number of concurrent Apache processes becoming far too high. So we set Apache's MaxClients setting (which determines the maximum number of simultaneous Apache processes) to a better value, and adjusted our configuration to cope with load more gracefully.

The data provided by Dawn has helped us quickly identify the source of the problem and take action to resolve it, saving about 2-3 hours of developer time and potential site downtime, which would have cost the client thousands of dollars in lost revenue.

Later on in August, we found another problem. Dawn was reporting high load average and Apache process counts. Dawn sent us email and SMS, alerting us to the issue. By looking at Apache's mod_status page, we were able to see that the problem was due to to a large number of requests sitting at the "Reading Request" state, which implied a Denial of Service Attack. We solved it simply by blocking the offending IP address.

Prior to Dawn, problems such as these would often be solved only by a painstaking process of trial-and-error. In both cases, Dawn detected the problem, saved developer time and decreased downtime.

Post your comment

Comments for this post are now closed.



No one has commented on this page yet.