Pure-Perl XML
In the past I sometimes used XML::Tiny and I found it perfect for the job. Agreed, I had to struggle only with very little and under-control XML, so I knew I could do without a full-fledged XML Parser.
On the flip side, this rating by Aristotle has always bugged me. I respect Aristotle's opinion a lot, hence this was sufficient for me to look for alternatives... just in case XML::Tiny failed me (which didn't happen so far, anyway).
I've been quite disappointed by XML::Parser::Lite as suggested though. Here's a little example script condensing my findings:
#!/usr/bin/env wrapperl
use strict;
use warnings;
use XML::Parser::Lite;
use XML::Parser;
use XML::Tiny ();
my $xml = <<'END';
<?xml version="1.0"?>
<what>
<ever><<![CDATA[&foo <=> &bar]]>></ever>
</what>
END
$" = '], [';
my ($what_ever, $collect);
my %handler_for = (
Init => sub { $collect = $what_ever = ''; },
Start => sub { $collect = ($_[1] eq 'ever'); },
Char => sub { $what_ever .= $_[1] if $collect; },
End => sub { $collect = '' },
);
print "perl $]\n";
print "XML----------------\n$xml-------------------\n";
for my $class (qw< XML::Parser XML::Parser::Lite >) {
my $version = do {
no strict 'refs';
${$class . '::VERSION'};
};
print "$class $version\n";
my $parser = $class->new();
$parser->setHandlers(%handler_for);
$parser->parse($xml);
print " what/ever => [$what_ever]\n";
} ## end for my $class (qw< XML::Parser XML::Parser::Lite >)
open my $fh, '<', \$xml or die "$!";
my $doc = XML::Tiny::parsefile($fh);
print "XML::Tiny $XML::Tiny::VERSION\n";
print " what/ever => [$doc->[0]{content}[0]{content}[0]{content}]\n";
(If you're wondering about what's that thing in the hash-bang, you can read about it here)
I threw XML::Parser in just to have a control group. Let's run it:
perl 5.018001
XML----------------
<?xml version="1.0"?>
<what>
<ever><<![CDATA[&foo <=> &bar]]>></ever>
</what>
-------------------
XML::Parser 2.44
what/ever => [<&foo <=> &bar>]
XML::Parser::Lite 0.721
what/ever => [<>]
XML::Tiny 2.06
what/ever => [<&foo <=> &bar>]
So, it seems that CDATA
sections aren't handled well by XML::Parser::Lite, which is a bit surprising considering that it is considered the implementation of a complete XML parser.
The module is based on this article from 1998, which seems to support CDATA (at least at a shallow inspection). Maybe the translation into Perl code failed at some point?
Update 1 (2016-01-24 09:02:25): Looking at the big regexp in XML::Parser::Lite, it is matching the CDATA but simply discarding it away. Compare the following lines:
my $CDATA_CE = "$UntilRSBs(?:[^\\]>]$UntilRSBs)*>";
#...
my $PI_CE = "($Name(?:$PI_Tail))>(?{${package}::_xmldecl(\$5)})";
Where there is a callback in $PI_CE
to call _xmldecl()
, there is no callback in $CDATA_CE
, which makes the regexp accept the CDATA but just throw it away. XML::Parser::LiteCopy seems to address this via a CData
handler.
Have you considered and/or tried Mojo::DOM? If you have decided against it can I ask why?
@Joel thanks for the hint. Mojo::Dom seems fine for projects where I use Mojolicious, but for standalone-ness I'd probably look into DOM::Tiny which is derived from it.
I'd be interested in any discussion about "practicality" vs "correctness" as the comment from Aristotle seemed to imply. And possibly understand whether YAX and XML::Parser::REX are worth investigating or not!
It really is too bad that people don't realize how small the Mojolicious namespace (the web framework) is even relative to the Mojo namespace (the web toolkit), which is pretty tiny itself.
@Joel not sure what you mean with your comment. My goal is the equivalent operation of a fat-packing; in such cases, having a module that has less dependencies (especially when they are guaranteed) is easier to include than one with more (especially when they are unbounded in time).
It's a restriction of the tool I use (mobundle in http://repo.or.cz/deployable.git), which requires me to name exactly what I want to include. I tend to see this as a feature (gives more control), others might just mark this as lazy programming (might give more dwimmery). Mojo/Mojolicious can grow/shrink in number of inter-dependent components as you guys see fit, this simply clashes against my need.
I've used Mojolicious (the distro) a few times and it does its work. Its goal is to deliver a web framework with a lot of batteries included (i.e. the web toolkit), not to be a pack of batteries. I really can't see it as a general purpose library of functions, I didn't see it advertised as such anywhere, my time and energies are too limited to even try and ask about this (I once tried to discuss a problem I had while debugging, just to discover how easily I can get frustrated and learn the equivalent of touching a boiling pot of water for a small child: "Don't do that again").
When fat packing is the goal then yes you may indeed be right. More often the complaint is simply "why would I want to pull in a web framework when I want to parse xml?" to which my answer is "why not".
To your other point. I'm sorry to hear that you weren't happy with your experience with asking a question. Like all communities we have our up and down days. I hope you'll try is again sometime if you need help, I personally as well as the community as a whole, have been trying to make sure the community id's welcoming.
And those typos are what I get for trying to comment from my phone :p
I should point out that I wrote XML::Tiny with the explicit aim of annoying and trolling the hell out of people who take the religion of XML far too damned seriously.
It worked.
@Joel: I agree too, Mojolicious is not the kind of "installing half of CPAN" thing that sometimes makes me look for alternatives.
WRT my community experience, there are always two ends in a communication and they can be both wrong, mine included (as it often happens). My personal experience was rather bad and this is a reason why I would personally think twice before using the framework outside its "institutional" goal.
@David: I hope that was a funny side effect for you :) Reading old (experience-wise, not age-wise) Perlers like you and Aristotle argue always sounds like Mom and Dad yelling at each other, but it's life. Aristotle's criticism probably helped make XML::Tiny better, which is a good thing anyway (hubris at work?!?) - thank you both :)
> I really can't see it as a general purpose library of functions, I didn't see it advertised as such anywhere
While this has always been a goal of the Mojo namespace, perhaps we haven't been clear/loud enough about it. We have made several changes in our documentation to reflect this, most prominently and explicitly in our official mission statement seen here at http://mojolicious.org/perldoc/Mojolicious/Guides/Contributing#Mission-statement but also sprinkled elsewhere.
@Joel thanks!