Easy DOM parsing with Mojo::DOM

By tempire on February 6, 2011 6:14 PM

Long ago, I used regex's to parse HTML.

They told me it was evil. They told me it was not maintainable. They were right.

But the alternatives were painful. They were clunky. They required me to change the way I approached HTML. They required me to abandon the hipness of css selectors I had embraced with javascript libraries, and ignore the many years spent perfecting my css-foo.

HTML::Parser, HTML::TreeBuilder, I'm sure you're brilliant in your own way. I'm sure you have conquered many lands, and for those who wanted to adapt to your mindset, you brought much happiness.

I wanted a simpler way. jQuery taught us that css selectors are that better way.

Fortunately, Mojo::DOM sprouted up out of the land of cookies and rainbows and unicorns.

Seriously, have you ever seen HTML retrieved, parsed, and processed so nicely in Perl?

Since most of you have used jQuery and/or similiar Javascript libraries, you already know how to use Mojo::Client/DOM. You simply apply your existing Perl & Javascript knowledge, and you're done.

You're not relegated to parsing only web-requested data; you can use Mojo::DOM directly:

Installing is one-step easy:
curl -L cpanmin.us | perl - Mojolicious

So now you know: you can leave the cruftiness behind, and hang with the hippest of the hip - there's no need to hide your head in shame when talking with hipsters about the latest Ruby shine. Shine is external; your Mojo runs deep.

Mojo::DOM docs

5 comments

5 Comments

mpeters.myopenid.com | February 6, 2011 8:31 PM | Reply

pQuery has been around for a while and so has HTML::Query and they both allow you to use CSS selectors to manipulate HTML. And they can both be used outside of Mojolicious.

tempire | February 6, 2011 9:49 PM | Reply

That may be; I'm sure both the modules you've mentioned are fully capable, though I like having a sleek toolkit that provides the retrieval, processing, and parsing, and installs in about a minute without the XS prerequisites.

Plus, the homepage has arbitrary unicorns on it.

Ron Savage | February 7, 2011 1:59 AM | Reply

Another possibility is to use:
http://search.cpan.org/~jkegl/Marpa-HTML-0.102000/

ghartz.myopenid.com | February 7, 2011 10:17 AM | Reply

The idea behind this is really nice, but sadly the code and comments in Mojo/DOM.pm are a really special kind of crazy.

mateu.myopenid.com | February 9, 2011 9:11 PM | Reply

Thanks for the introduction to Mojo::DOM. I've recently been looking for such a tool. I'm finding it and HTML::Zoom both useful for selecting and manipulating nodes on a given a chunk of html.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About tempire

I do not like the status quo. There is always a better way; the question is whether you care enough to find it.

More info »

tempire