Using XML::Compile to output XSD compliant XML

As part of a recent project I was given an XSD file (xml schema definition) and asked to output compliant XML. CPAN to the rescue. I found XML::Compile::Schema which is a cool module that allowed me to do this with very little fuss. The documentation is really good but I think a tutorial-style post might be helpful.

To do this you’ll need to install XML::Compile and XML::LibXML.

You can use XML::Compile::Schema to read in your xsd file and output a perl hash template. Then you can use that example template to construct a hash of real data and then have XML::Compile::Schema output a valid XML file.

For this tutorial, download a sample .xsd file from here. Then write a perl script like so to dump a perl hash template.

#!/usr/local/bin/perl

use warnings;
use strict;

use Data::Dumper;
use XML::Compile::Schema;
use XML::LibXML::Reader;

my $xsd = 'test.xsd';

my $schema = XML::Compile::Schema->new($xsd);

# This will print a very basic description of what the schema describes
$schema->printIndex();

# this will print a hash template that will show you how to construct a 
# hash that will be used to construct a valid XML file.
#
# Note: the second argument must match the root-level element of the XML 
# document.  I'm not quite sure why it's required here.
warn $schema->template('PERL', 'addresses');

The relevant output looks like this:

# is an unnamed complex
{ # sequence of address

  # is an unnamed complex
  # occurs 1 <= # <= unbounded times
  address =>
  [ { # sequence of name, street

      # is a xs:string
      # is optional
      name => "example",

      # is a xs:string
      # is optional
      street => "example", }, ], }

The comments are helpful (and were provided by XML::Compile::Schema directly, not by me). It basically says your data structure should start as a hashref which should contain an entry called “address” which is a reference to an array. This array should be a list of hash references which each contain two elements, name and street.

From this you can deduce that a valid hash will look something like this.

my $data = {
    address => [
        {
            name => 'name 1',
            street => 'street 1',
        },
        {
            name => 'name 2',
            street => 'street 2',
        }
    ],
};

In order to output the XML, you have to do this:

my $doc    = XML::LibXML::Document->new('1.0', 'UTF-8');
my $write  = $schema->compile(WRITER => 'addresses');
my $xml    = $write->($doc, $data);

$doc->setDocumentElement($xml);

print $doc->toString(1); # 1 indicates "pretty print"

My output looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<addresses>
  <address>
    <name>name 1</name>
    <street>street 1</street>
  </address>
  <address>
    <name>name 2</name>
    <street>street 2</street>
  </address>
</addresses>

The actual XSD and resulting XML files I was dealing with were much more complicated but I followed this process and had no trouble whatsoever.

4 Comments

Nice tutorial - i've tried to follow the XML::Compile doco but it's a big body of work for a busy developer under pressure, so a simple working example is just what i need. What i'm trying to do is to read XML into a hash. The printIndex() displays :

namespace: http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml
filename: /cygdrive/h/tony.wood/xml/xsd/FineReader10-schema-v1.xsd
definitions of elements:
document
definitions of simpleTypes:
ParagraphAlignment
TableCellBorderType
definitions of complexTypes:
BarcodeType
BlockType

... blah blah ...

TextType
WordRecognitionVariant

but I'm having trouble with

$schema->template('PERL', 'my_top_level_elem');

could you please publish the xsd you are using ?

In my xsd i have :

blah blah...


etc....

In this case
$schema->template('PERL', 'document');
issues :
error: cannot find element or attribute `document'

What token do i need to use as the second argument ?

Thanks in advance
Tony Wood

Looks like the XSD was lost in the previous comment. here it is :

targetNamespace="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml"
xmlns:tns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml"
elementFormDefault="qualified">

Schema for representing OCR results exported from FineReader 10.0 SDK. Copyright 2001-2011 ABBYY, Inc.





etc....

I got around this problem by using 'pack_type'

use XML::Compile::Util qw/pack_type/;
my $type = pack_type 'http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml', 'document';
warn $schema->template('PERL', $type);

Thank you for this, and also, thank you to the commenter "tonyw". I had the exact same problem he did, because my schema is namespaced.

Anyway, this post has made a big project I'm working on considerably easier, so thank you. :)

Leave a comment

About Brian E. Lozier

user-pic I like Perl.