Now With Go-Faster Stripes

If you've ever had a look at the Status page on the CPAN Testers Reports site, you will likely have noticed that typically the graphs show 4-5 lines on any given day. This has been pretty much the case since I added this monitoring feature, and supported the fact that it could take up to 5 days for a less common page to be rebuilt.

However, over the last 2 weeks that has been changing, to the point that from about 4pm CET yesterday (24/11/2010) the builder only had requests less than 24 hours old. It appears there are three reasons for this.

The first is that we have seen a reduction in report submissions over the past few weeks. Having said that, the submissions during October was rather substantial, topping over 500,000 submissions, so it's not too surprising to see a reduction. And to be fair looking at the Monthly Stats, we have already had over 300,000 report submissions this month, so it's not been a quiet month either.

CPU Usage

Secondly the Reports site has had some alterations to it to reduce the hits from robots. With around 15-20 crawlers starting to hit the site at once, processing was occasionally affecting other areas of the build process, as well as the backup processes. As such 'rel="nofollow"' has been added as an attribute to links for RSS, YAML and JSON files. This had a dramtic effect, as can been seen in the CPU graph, to the point that the server load dropped to under 2.0 for long periods for the first time since I set up the server! The change essentially means crawlers now only reference just under 10 million pages, rather than over 20 million, and don't pull several gigabytes of storage data off the server each day.

Thirdly the bzip2 process to archive the backup databases now only happens once a day. With the reduction in server hits, this now takes far less time to process. and no longer has a prolonged effect on the build process. Previously it could take over 2 hours to compress the archive 6 times a day, and now takes about 25 minutes once a day.

In addition there have been some minor tweaks to the build process, but the major changes are still waiting in the wings, as the current data stores need to fully update to allow me to implement them.

Every so often we get asked why a particular report hasn't appeared on the Reports site. Depending on the sync process it can be anything between a few minutes and an hour. However, as most watch for reports via the Distro or Author pages, previously it could take up to 5 days to appear. Currently that's now down to less than a day. However that still isn't quite quick enough, which is why some further improvements will hopefully be implemented over the next week.

Expect some more updates soon on the next set of changes and some of the proposed changes for the future.

Cross-posted from the CPAN Testers Blog.

1 Comment

Cool, great to see some of these optimizations helping to reserve CPU load for other things, instead of just serving to bots. Thanks for sharing!

Leave a comment

About CPAN Testers

user-pic This is the new account for incidental and summary updates to what's happening with the CPAN Testers. For all the latest news and views please see our blog.