HTML Archives

Increasing Perl’s Visibility, Redux

Quite a while ago, I blogged about how Perl projects should have websites to increase not only their visibility, but the visibility of Perl as a whole.

Perl has had the CPAN and awesome websites like MetaCPAN and its predecessor search.cpan.org for a long time, so unlike how things happen in other programming language ecosystems, many Perl projects have felt no need to start their own websites for documentation, package downloads, and community — all these things were already provided.

However, I do feel that this centralization keeps Perl content on the Internet very isolated and makes Perl less visible than other programming languages.

Web Scraping with Zydeco

So I like to keep local copies of my blogs.perl.org blog posts as Atom entries, but noticed yesterday that I had a few gaps in my collection. The Atom feeds offered by blogs.perl.org only have the most recent articles though, so I decided to write a quick script to scrape the posts. Luckily, I managed to get a table containing the URLs for each post I needed, so I didn't need to bother with following links to find the pages; I just needed to grab the content from them.

I thought some people might find the code interesting especially for its use of lazy attributes. This is one of those "it only needs to be used once, so making the code maintainable isn't important" kinds of projects, do bear that in mind. I've cleaned up the whitespace and added comments for this blog post, but other than that, it's just a quickly hacked together script.

Processing schema.org markup with Perl

Someone on IRC asked me for an example of how to parse schema.org markup using my HTML::HTML5::Microdata::Parser module. So here one is. It pulls the microdata from the page, and queries it using SPARQL.

#!/usr/bin/env perl

use/users/toby_inkster/html/index.html

About Toby Inkster

user-pic I'm tobyink on CPAN, IRC and PerlMonks.