Another approach to XML processing

By Jakob on September 9, 2013 9:52 AM

The buzz around XML has passed and we are left with a lot of Perl modules to process XML in different ways. I was surprised to still find a gap for another XML processing module.

Common schema-less approaches to XML processing in Perl seem to use XML-LibXML, to get a full XML DOM or a stream of parsing events, or XML::Simple (better used as XML::LibXML::Simple. XML::Simple transforms between XML and Perl data structures but it was designed for "data-oriented" XML where element order does not matter a lot. With XML::Struct I created something like XML::Simple for "document-oriented" XML.

While XML::Simple returns uses (or hashes of arrays when elements are repeated), XML::Struct uses arrays for representing XML data. This is best illustrated by an example:

<root>
  <foo>text</foo>
  <bar key="value"> <!-- mixed content here: -->
    text
    <doz/>
  </bar>
</root>

is transformed to this structure:

[
  root => { }, [
    [ foo => { }, "text" ],
    [ bar => { key => "value" }, [
      "text", 
      [ doz => { } ]
    ] 
  ]
]

XML Attributes are transformed to hashes, that can also be omitted with attributes => 0. If you still want a key-value structure for (parts of) a document, use hashifyXML. The distribution contains methods for both parsing, and serializing based on XML::LibXML. XML is processed as stream, so one can also extract chunks from very large XML files.

Comments, bug reports, extensions etc. are welcome, especially at https://github.com/nichtich/XML-Struct.

1 comment

Tagged as:

xml

1 Comment

Mohammad S Anwar | September 12, 2013 1:28 PM | Reply

Exactly what I wanted a little while ago. Keeping the order intact is the best part of it.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Jakob

Research & Development at a German library union network.

More info »

Jakob