My continuing dream of a Perl XML(::Twig) cookbook
I ran into Michel Rodriguez at YAPC, so I started talking to him about my idea for an XML::Twig cookbook. I really, really love his module and want more people to use it. It takes a bit to get used to, so I think it's ideally suited for some cookbook-style documentation.
His advice is that XML::Twig is what you use when parsing and futzing with XML is not the primary purpose of your program (e.g., you have some config files to read). That turned into a bit of a discussion with those around us about when you should use which XML modules, and that we should answer the same cookbook recipes with examples from multiple modules (like a Bobby Flay cook-off, I guess).
- Use XML::Simple if it does exactly what you want. When it doesn't do exactly what you want (i.e. you have to configure it), stop using it.
- Use XML::Twig for light XML tasks, mild processing, etc.
- Use the low-level XML modules when you need to get much more out of the process and you need more flexibility than the higher-level interfaces provide.
This gets back to my standard argument that there can never be one XML (or CGI or web framework or ...) interface because there isn't just one XML task that people have to do. People tend to stick with the XML module (CGI module, web framework) that they learn first and force every task into that way of thinking. I'd like to see a Consumer Reports style guide to for these things.
Use XML::Simple, but anything beyond using the "forcearray" option means it's time to bust out more specialized modules. Not configuring it at all seems a little hasty to me.
XML::TreePuller for when it needs to go really really fast.
I've recently become a big fan of XML::XPath. Once you figure out the API, it's very easy to use XPaths to grab useful bits of stuff from XML. I've been using it for everything from extracting interesting bits out of RSS and Atom feeds to processing some custom/ad-hoc XML formats.
XML::DT is my bitch. Some similar funcionalities with Twig and Simple. But I find it easier to use. And its mine...
Mike:
Use XML::LibXML instead of XML::XPath. It has the same API, is both faster and leaner, and doesn’t flagrantly violate specs.
Mike:
Most XML modules let you use XPath, either natively or with an XPath add-on: XML::LibXML, XML::Twig::XPath, XML::DOM::XPath... XML::XPath is not maintained (look at the RT queue) and I wouldn't advocate using it.
So in 5 comments we already have 5 different modules being mentioned!
I think then a good first step would be to create a list of tasks to perform, if possible representative of real-world problems, interesting and/or tricky. Then we can each present solutions with our favorite module, then agree that XML::Twig is really the best one ;--)
Just to add to the list of modules mentioned. If you like to use Moose, and you prefer XPath, and your XML content fits in memory, a look at my XML::Rabbit module could be worthwhile. I like the fact that it is super easy to quickly extract single, array and hashes of data from XML with simple XPath queries.