CPAN Testers Server - Update 14/09/2011
Initial checks on the database highlighted some discrepancies, which have now been fixed. The databases have now been archived and are now rebuilding. It is hoped that this will be completed within the next few days.
Once the database are all rebuilt and sync'ed, the websites will slowly be switched back on. The first sites that will appear will be the Statistics and Devel sites, with the Reports website coming back online once the bulk of the support files (JSON, JS & HTML) have been recreated.
The CPAN Testers server is also one of the Tier-1 fast mirrors for CPAN. With this being quite important for a number of services, this was the first part of the server to be rebuilt. Finding a suitable BACKPAN seed has proved troublesome, as apart from the FUNET server there are no public rsync mirrors. While previously the FUNET server has been fine for seeding, David Cantrell highlighted that some of the timestamps in the repository are incorrect. It appears someone has touched some of the files on the FUNET server, without realising the consequences. As such the current BACKPAN repo may not correctly list the upload dates.
Despite several traumas, particular with permissions, the repos for BACKPAN and CPAN are now available for FTP and rsync access. Note that all the FTP and rsync (as well as HTTP .. more of that in a moment) paths/modules all use the capitalised versions of BACKPAN and CPAN. This is more specific to rsync, as previous modules were lower case.
All websites, including the HTTP access to BACKPAN and CPAN, are currently unavailable. With the databases rebuilding disk IO is at full throttle. Apache unfortunately also tries to access many files, particularly for logging, and the load of the server is impacted considerably. As such, to allow the databases to rebuild as quickly as possible, the webserver will remain turned off.
After turning on FTP access I was quite intrigue to see Google attempting a denial of service on the server. Having approximately 50 bots all trying to scan the FTP directories was not good. I have now blocked access to Googlebot, and will do so for any other bot that I see using the FTP or rsync repos. I don't have a problem with a single connection scanning the directories, or anyone requiring a full archive download, but any IP blocks trying to access the server all at once will be blocked.
We're getting there, but it's just taking a little longer that I'd hoped to get ourselves back online. If this has taught me anything it's that, while database and source code backups are all well and good, a complete and regular backup of web directories and config files are also extremely useful!
More news soon.
Cross-posted from the CPAN Testers Blog