Another approach to XML processing

The buzz around XML has passed and we are left with a lot of Perl modules to process XML in different ways. I was surprised to still find a gap for another XML processing module.

Common schema-less approaches to XML processing in Perl seem to use XML-LibXML, to get a full XML DOM or a stream of parsing events, or XML::Simple (better used as XML::LibXML::Simple. XML::Simple transforms between XML and Perl data structures but it was designed for "data-oriented" XML where element order does not matter a lot. With XML::Struct I created something like XML::Simple for "document-oriented" XML.

While XML::Simple returns uses (or hashes of arrays when elements are repeated), XML::Struct uses arrays for representing XML data. This is best illustrated by an example:

<root>
  <foo>text</foo>
  <bar key="value"> <!-- mixed content here: -->
    text
    <doz/>
  </bar>
</root>

is transformed to this structure:

[
  root => { }, [
    [ foo => { }, "text" ],
    [ bar => { key => "value" }, [
      "text", 
      [ doz => { } ]
    ] 
  ]
]

XML Attributes are transformed to hashes, that can also be omitted with attributes => 0. If you still want a key-value structure for (parts of) a document, use hashifyXML. The distribution contains methods for both parsing, and serializing based on XML::LibXML. XML is processed as stream, so one can also extract chunks from very large XML files.

Comments, bug reports, extensions etc. are welcome, especially at https://github.com/nichtich/XML-Struct.

1 Comment

Exactly what I wanted a little while ago. Keeping the order intact is the best part of it.

Leave a comment

About Jakob

user-pic Research & Development at a German library union network.