2011 Perl QA Hackathon - Day 2

Different languages are suited to different things. We know this well and we remind people about it from time to time. For example, Erlang is a great language for running a massively concurrent system, but the language itself is rather slow, so it would be awful for performance-intensive procedural work.

By a similar token, there are some things for which Perl is simply not the first choice. If you want to write a rich, cross-platform GUI application, there are a number of choices, but Perl's not your best one by a longshot. I once timidly suggested that a GUI toolkit be pushed into the core to resolve this and I was shot down immediately for excellent technical reasons which nonetheless relegate us to a non-contender in this arena.

If you have an interesting idea for a Web-based application that you want to quickly prototype, PHP is an excellent choice. Many reading this blog would argue in favor of Perl because, after all, we know the "right" way of writing Web apps with proper separation of concerns and all that and PHP is, well, just ugly, right? Meanwhile, we tell people with a straight face that it's also OK to write quick and dirty hacks to get work done, so long as that quick and dirty hack isn't PHP. Everyone in the world quickly listened to us and now PHP is practically ancient history. (Why doesn't HTML have a <sarcasm/> tag?)

Meanwhile, Andy Armstrong found himself needing to process a rather large XML schema from Final Cut Pro and, since he loves Perl, figured it would be a quick and easy task. After all, plenty of languages have nice XML integration to let you quickly throw XML schemas at it, get some objects, change them, and write the XML back out.

Except Perl. (bear with me for a moment)

After researching for a day or so on this "easy task", he gave up and switched to Java where it was, in fact, an easy task. Using any of a number of modules you could toss an XML schema at it and get your objects. Job done.

Note that I'm not talking about random XML. We do have some tools for working with random XML, but consider this:

<countries region="Europe">
    <country><name>France</name></country>
    <country><name>Germany</name></country>
    <country><name>Luxembourg</name></country>
</countries>

How many countries do you have there? For that XML, you have three. For a schema, you might have zero or more or one or more. For other schemas you might have similar structure but max out at, say, five sub-elements. You might have tightly restricted data types. You might have namespaces. You might have all sorts of things which arbitrary XML won't necessarily demonstrae but which a robust XML processor might need to know. Hence, Andy wanting to create objects from schemas and finding Java a much better tool for the job.

To say that Perl is lacking in this area is like saying your child's tricycle skills aren't quite ready for the Tour de France.

After heading about Andy's problem, I also did research and was extremely disappointed. Though it's no QA related, I decided to take a quick stab at this and you'll now find Corinna, a fork of XML::Pastor on github. I've notified the author of the fork, but given that bugs have been piling up and he's apparently stopped development a couple of years ago, I'm unsure if I'll hear back from him.

The reason for the fork is two-fold:

  • I have the freedom to change the API
  • If the author wants to hack on XML::Pastor again, there will be no conflicts

So let's say that you have an XML schema representing countries:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns="http://www.example.com/country" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.com/country" elementForm
    <xs:simpleType name="CodeUNLocode">
        <xs:restriction base="xs:string">
            <xs:minLength value="2"/>
            <xs:maxLength value="6"/>
            <xs:pattern value="qsd"/>
            <xs:pattern value="hello"/>
        </xs:restriction>
    </xs:simpleType>

    <!-- lots of stuff snipped -->

    <xs:complexType name="Population">                                                                                                                    
        <xs:attribute name="date" type="xs:date"/>
        <xs:attribute name="figure" type="xs:nonNegativeInteger"/>
    </xs:complexType>
    <xs:element name="city" type="City"/>
    <xs:element name="country" type="Country"/>
    <xs:element name="population" type="Population"/>
</xs:schema>

You can then read the schema:

my $corinna = Corinna->new();


$corinna->generate(
    mode         => 'offline',
    style        => 'multiple',
    schema       => '/some/path/to/schema.xsd',
    class_prefix => 'MyApp::Data::',
    destination  => '/tmp/lib/perl/',
);

And then do stuff like this:

my $country = MyApp::Data::country->from_xml('http://some/url/to/country.xml');

# Now you can manipulate your country object.
print $country->name; 
print $country->currency->_code;  
print $country->city->[0]->name;

You can also update the data and write the XML back out, if desired.

I would like to be able to upgrade Corinna to using Moose extensively (I've only done it for one of the modules) and make it optionally produce Moose code. Currently it produces eye-gougingly ugly code like this:

Corinna::Test::Type::CountryNameType->XmlSchemaType( bless( {
     'attributeInfo' => {
                        'Code' => bless( {
                                         'class' => 'Corinna::Test::Type::ISO3166',
                                         'documentation' => bless( [
                                                                     bless( {
                                                                              'text' => 'ISO 3166 code for a country.',
                                                                              'xml:lang' => 'en'
                                                                            }, 'Corinna::Schema::Documentation' )
                                                                   ], 'Data::HashArray' ),
                                         'metaClass' => 'Corinna::Test::Pastor::Meta',
                                         'name' => 'Code',
                                         'scope' => 'local',
                                         'targetNamespace' => 'http://www.example.com/country',
                                         'type' => 'ISO3166|http://www.example.com/country',
                                         'use' => 'optional'
                                       }, 'Corinna::Schema::Attribute' )
                      },
     'attributePrefix' => '_',
     'attributes' => [
                       'Code'
                     ],

My main obstacle right now is bandwidth. I'd love to encourage more people to hack on Corinna and maybe tackle the (rather small) bug queue. With any degree of luck, we can make working with XML schemas a "solved problem" in Perl.

Note that Chris Prather is trying to resurrect Class::OWL to do this with RDF documents. This is awesome and would like to see it move forward. Be forewarned that RDF and XML schemas are not interchangeable.

Also, some people have a vague sense that XML::Compile might also be suitable for this problem, but as one person explained "I'll be damned if I can figure it out". Mark, if you're reading this, could you elaborate and maybe give a working example of XML::Compile is good for this?

6 Comments

I actually tried solving this problem with my W3C::XMLSchema CPAN module (uses XML::Rabbit under the hood), but I never found time to finish it. It was supposed to enable extraction of structure from XSD files which then could be used to build classes in whichever way you wanted. It is all based on Moose and libxml. The source is on GitHub.

Lots of thoughts on this (mostly in the "yep, that's something people often think they want at a certain very early stage of development but never ends up being what they actually want once the rubber starts hitting the road" vein) but did you check out XML::Toolkit?

I think XML::Compile::Schema would work for this problem.

  1. Create a reader coderef to parse the XML into a big fat hash (according to the schema).

  2. Manipulate the hash in the all the sundry ways that people do, then

  3. Spit out the resulting XML by handing the hash (and a new libXML object) to a writer coderef which turns it back into XML based on the schema.

maybe I misunderstood the problem statement, but I suspect that would work. See the module synopsis for details. I tend to use XML::Compile in the context of SOAP handling, but I think it would do what you're looking for as well.

Could you elaborate on the position that WxPerl is inadequate as far as cross-platform GUIs go? I've not done perl GUI development, but had assumed Wx was pretty good.

I've had some luck using XML::Compile but it does take some getting used to.

Leave a comment

About Ovid

user-pic Have Perl; Will Travel. Freelance Perl/Testing/Agile consultant. Photo by http://www.circle23.com/. Warning: that site is not safe for work. The photographer is a good friend of mine, though, and it's appropriate to credit his work.