A Static Archive of use.perl.org

In my previous post I added a footnote that "use.perl.org is difficult to get info out of now, it's basically dead. A lot of content is lost". That turned out to be not entirely true, it just needed some work to get to it

In the reddit comments, brian d foy mentioned Léon's WWW::UsePerl::Server, a module to host the use.perl.org archive.

I grabbed the archive, Léon's module, installed all the deps, got it up and running after hacking the module to work with the latest Catalyst, then combined some sed, awk, perl, SQL to create a static version of the site with URLs that allow it to function correctly: https://use-perl.github.io/. Note that the change in the URL structure to get it to work a) as a static site, and b) on github pages, means that any "permalinks" that might exist elsewhere in reference to it will need manual fiddling to get to the page in question - the URL structure should be relatively obvious however.

The site is 99% there. There appears to be some mojibake, which I suspect is in the original mysqldump file but haven't confirmed. I also need to fix some self references to use.perl.org that are in entries/comments, stripping them out so they resolve to the current domain. But it's 99% there - about 40,000 blog entries, a lot of Perl, a lot of interesting history.

It's all in a git repo at https://github.com/use-perl/use-perl.github.io, which you can use if you want to search the content (it is now static afterall). Or file a PR or fix.

Happy Xmas spelunking

7 Comments

This is very cool, and gets that off my to-do list for this week.

Now I'm going back to read some of my journals to see if I had anything interesting to say. So far it seems that most things should be lost to time. :)

Excellent work, thanks. Now I need to seriously think about doing the same for this site!

Is it worth speaking to Robert and Ask and asking them if they would consider pointing the old use.perl.org domain at your Github site?

See https://noc.perl.org for contact details for perl.org infrastructure.

I’d prefer not pointing use.perl.org directly at this mirror due to the difference in URL structure. But it would be nice to point the domain at a redirector that 301s old URLs to the URLs on this new mirror, the way search.cpan.org now answers with 301s to MetaCPAN.

This particular mirror should then live at a different CNAME (maybe archive.use.perl.org) to ensure that new links directly linking to the static mirror won’t be tied to GitHub by domain (since the Perl NOC doesn’t control that domain and has no way of pointing it somewhere else – in case of future changes to that service that could kill the links).

Excellent! The icing on the cake would be if the redirect mapping also included the later static(ish?)-site URL structure:

Old: http://use.perl.org/use.perl.org/_Aristotle/journal/33448.html
New: https://use-perl.github.io/user/Aristotle/journal/33448/

It would be ideal if the redirector’s 404 logs were accessible, to be able to see if there are still more broken links out there that still get traffic (attempts), but I guess that is actually a question for whoever hosts the thing.

(I just found out from Google that that mirror is still reachable as
http://images.use.perl.org/use.perl.org/_Aristotle/journal/33448.html
… which is getting a little ridiculous with all the URLs under which the content has been reachable over time. I suppose running the same redirector under both use.perl.org and images.use.perl.org would take care of that. (It’s not 100% correct in that the original-style article URLs were (I think) never reachable on the images.use.perl.org domain… but who cares.))

Leave a comment

About Lee J

user-pic I blog about Perl.