Using XML::Compile to output XSD compliant XML

As part of a recent project I was given an XSD file (xml schema definition) and asked to output compliant XML. CPAN to the rescue. I found XML::Compile::Schema which is a cool module that allowed me to do this with very little fuss. The documentation is really good but I think a tutorial-style post might be helpful.

To do this you’ll need to install XML::Compile and XML::LibXML.

You can use XML::Compile::Schema to read in your xsd file and output a perl hash template. Then you can use that example template to construct a hash of real data and then have XML::Compile::Schema output a valid XML file.

For this tutorial, download a sample .xsd file from here. Then write a perl script like so to dump a perl hash template.

#!/usr/local/bin/perl

use warnings;
use strict;

use Data::Dumper;
use XML::Compile::Schema;
use XML::LibXML::Reader;

my $xsd = 'test.xsd';

my $schema = XML::Compile::Schema->new($xsd);

# This will print a very basic description of what the schema describes
$schema->printIndex();

# this will print a hash template that will show you how to construct a 
# hash that will be used to construct a valid XML file.
#
# Note: the second argument must match the root-level element of the XML 
# document.  I'm not quite sure why it's required here.
warn $schema->template('PERL', 'addresses');

The relevant output looks like this:

# is an unnamed complex
{ # sequence of address

  # is an unnamed complex
  # occurs 1 <= # <= unbounded times
  address =>
  [ { # sequence of name, street

      # is a xs:string
      # is optional
      name => "example",

      # is a xs:string
      # is optional
      street => "example", }, ], }

The comments are helpful (and were provided by XML::Compile::Schema directly, not by me). It basically says your data structure should start as a hashref which should contain an entry called “address” which is a reference to an array. This array should be a list of hash references which each contain two elements, name and street.

From this you can deduce that a valid hash will look something like this.

my $data = {
    address => [
        {
            name => 'name 1',
            street => 'street 1',
        },
        {
            name => 'name 2',
            street => 'street 2',
        }
    ],
};

In order to output the XML, you have to do this:

my $doc    = XML::LibXML::Document->new('1.0', 'UTF-8');
my $write  = $schema->compile(WRITER => 'addresses');
my $xml    = $write->($doc, $data);

$doc->setDocumentElement($xml);

print $doc->toString(1); # 1 indicates "pretty print"

My output looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<addresses>
  <address>
    <name>name 1</name>
    <street>street 1</street>
  </address>
  <address>
    <name>name 2</name>
    <street>street 2</street>
  </address>
</addresses>

The actual XSD and resulting XML files I was dealing with were much more complicated but I followed this process and had no trouble whatsoever.

5 Comments

Nice tutorial - i've tried to follow the XML::Compile doco but it's a big body of work for a busy developer under pressure, so a simple working example is just what i need. What i'm trying to do is to read XML into a hash. The printIndex() displays :

namespace: http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml
filename: /cygdrive/h/tony.wood/xml/xsd/FineReader10-schema-v1.xsd
definitions of elements:
document
definitions of simpleTypes:
ParagraphAlignment
TableCellBorderType
definitions of complexTypes:
BarcodeType
BlockType

... blah blah ...

TextType
WordRecognitionVariant

but I'm having trouble with

$schema->template('PERL', 'my_top_level_elem');

could you please publish the xsd you are using ?

In my xsd i have :

blah blah...


etc....

In this case
$schema->template('PERL', 'document');
issues :
error: cannot find element or attribute `document'

What token do i need to use as the second argument ?

Thanks in advance
Tony Wood

Looks like the XSD was lost in the previous comment. here it is :

targetNamespace="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml"
xmlns:tns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml"
elementFormDefault="qualified">

Schema for representing OCR results exported from FineReader 10.0 SDK. Copyright 2001-2011 ABBYY, Inc.





etc....

I got around this problem by using 'pack_type'

use XML::Compile::Util qw/pack_type/;
my $type = pack_type 'http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml', 'document';
warn $schema->template('PERL', $type);

Thank you for this, and also, thank you to the commenter "tonyw". I had the exact same problem he did, because my schema is namespaced.

Anyway, this post has made a big project I'm working on considerably easier, so thank you. :)

Brian -

Thanks, this is helping me get started. However, I don't want to manually code up a huge hash that I then process. I'm looking to make things a little more dynamic. I have a schema. I have a few files that contain data that I need to format according to the schema. Is there a way to poke at the schema to get a portion that I can then parse in some way? In this manner I can either build up a hash or send pieces of data into the WRITER that I then dump after I'm done processing my data files.

For instance, part of my data is spatial data. Something like "name", "x", "y", "z". the type describing this portion of data in my schema is complex type with the name "LocationData_type" (a sequence of 4 elements). I'd read a block of this spatial data, format it according to the type def in the schema, and then add the data to either a hash that I'm dynamically building, or directly to the WRITER itself.

However, looking at the documentation for XML::Compile::Schema, I'm not seeing a way to determine the needed formatting. I see there's a way, I think, (via hooks and/or typemapping) to add the data. However, I don't see a way to know what a specific type in the schema looks like. This means hard coding the formatting, which means more maintenance when the schema changes.

Any thoughts on this?

Leave a comment

About Brian E. Lozier

user-pic I like Perl.