Count-up to 100 CPAN Distributions: Quality *as well as* Quantity

Going over the previous count-ups to 100 CPAN distributions in this blog, I feel I have been carried away: I tried to release a lot of new distributions to the CPAN, while neglecting all the existing distributions I have, and not tending to their pending problems and their lack of software quality in several aspects (See an older essay I have written about it).

I was reminded of it when working on a few new or existing distributions using Dist-Zilla. I ran into a few places where some of its plugins or features were underdocumented, and non-exemplified (in the SYNOPSIS sections or wherever), which consumed many hours of my time, sometimes made me need to read or grep the code, and made me frustrated. While I sent the maintainers some pull requests for better documenting this (and some of them were applied), I would have prefered for them to be properly documented and exemplified in the first place.

While I believe most of my CPAN distributions are in better condition, there are some aspects that are lacking, and I would like to fix them: more examples outside the SYNOPSES, and better SYNOPSES; documenting all methods of the super classes as well as the classes themselves for XML-LibXML; adding support for DTD introspection and the oft requested DOM elements annotations for it; more kwalitee; etc. and naturally also better publicity and advertising (Freecode.com (formerly known as Freshmeat.net), Reddit, blogs.perl.org, video screencasts, etc.) because people should see how cool your code is. I'm not sure if any of these things would be as blog worthy as new CPAN distributions, but it will make CPAN a bit better.

(Now that I think of it, perhaps a CPAN janitors initiative, which will submit pull requests and patches for existing distributions to improve their quality will be a good idea.)

Anyway, I am going to split this report into sections:

XML-GrammarBase

After some work (and misunderstanding of some of the Moo / MooX documentation, which resulted in some pull requests), I released XML-GrammarBase which aims provides roles and base classes for easily writing wrappers around XML schema formats and XML translation languages. So far only RELAXNG and XSLT are supported. Here is the synopsis from the docs:

package XML::Grammar::MyGrammar::ToOtherStuff;

use MooX 'late';

use XML::GrammarBase::Role::RelaxNG;
use XML::GrammarBase::Role::XSLT;

with ('XML::GrammarBase::Role::RelaxNG');
with XSLT(output_format => 'html');
with XSLT(output_format => 'docbook');

has '+module_base' => (default => 'XML-Grammar-MyGrammar');
has '+rng_schema_basename' => (default => 'my-grammar.rng');

has '+to_html_xslt_transform_basename' => (default => 'mygrammar-xml-to-html.xslt');
has '+to_docbook_xslt_transform_basename' => (default => 'mygrammar-xml-to-docbook.xslt');

package main;

my $xslt = XML::Grammar::MyGrammar::ToOtherStuff->new(
    data_dir => "/path/to/data-dir",
);

# Throws an exception on failure.
my $as_html = $xslt->perform_xslt_translation(
    {
        output_format => 'html'
        source => {file => $input_filename, },
        output => "string",
    }
);

perform_xslt_translation supports myriad options and formats, and you get it all for free.

I've also been contemplating something like App::XML::GrammarBase or XML::GrammarBase::App that will allow for easy registry of such formats from the command line and for processing them.

At first, I believed that Moo was "almost-completely-unlike-Moose", but «use MooX 'late';» makes it much more Moose-like and I enjoy it, and it is now recommended over Any::Moose or maybe even plain Mouse. Mouse is very fast as well and has a far smaller startup time than Moose, but it does not interoperate very well with pure-Moose classes, which Moo does well.

HTML-Widgets-NavMenu-ToJSON

As I was working on my homepage, I noticed that after I placed all the sub-navigation menus inside the main navigation menu, it increased the size of the navigation menu considerably, having it consume over 40,000 bytes. I decided to put only a subset of it as HTML, and load and populate the rest using an AJAX call to a common and static .json file. So I needed a way to convert the input of HTML-Widgets-NavMenu

So enter HTML::Widgets::NavMenu::ToJSON, which serialises a navigation menu structure as JSON. I also implemented a persistence layer for it, currently with only one backend - a YAML file, so the IDs which jqTree require would be, well, persisted. This is a bit of an architecture astronautics, but I ended up with something concrete at the end so it's under control.

Here is the synopsis:

use HTML::Widgets::NavMenu::ToJSON;
use HTML::Widgets::NavMenu::ToJSON::Data_Persistence::YAML;

my $persistence =
    HTML::Widgets::NavMenu::ToJSON::Data_Persistence::YAML->new(
        {
            filename => '/path/to/persistence_data.yaml',
        }
    );

my $obj = HTML::Widgets::NavMenu::ToJSON->new(
    {
        data_persistence_store => $persistence,
        # The one given as input to HTML::Widgets::NavMenu
        tree_contents => $tree_contents,
    }
);

use IO::All;

io->file('output.json')->println(
    $obj->output_as_json(
        {
            %args
        }
    )
);

It could be made a bit simpler using an abstraction, but I feel it's not too bad. You can write custom HTML::Widgets::NavMenu::ToJSON::Data_Persistence child classes by implementing the load() and save() methods - I did not document the process yet, but you can base your efforts on HTML::Widgets::NavMenu::ToJSON::Data_Persistence::YAML.

Dist-Zilla-Plugin-TestRun

Since I'm used to running ./Build runtest instead of ./Build test for my distributions, I sought a way of doing the same using Dist-Zilla. It turned out there wasn't an easy way, and Dist-Zilla-Plugin-TestRun is the best I could do. I hope to write a patch to make the test command more configurable.

Test-XML-Ordered

Test::XML::Ordered makes a non-"semantic" diff of XML files and tells you if they are equivalent (up to differences of whitespace and namespace prefixes). It grew out of my frustration with XML::SemanticDiff, upon which Test::XML is based, and which I maintain. A quick CPAN search revealed other modules for testing XML in a similar manner, but since I have already written my code, I decided to release it.

One tip is that with the current API, one should use something like:

my @common = (validation => 0, load_ext_dtd => 0, no_network => 1);

is_xml_ordered (
    [ string => normalize_xml($results_buffer), @common, ],
    [ location => "./t/data/xhtml-results/$fn_base.xhtml", @common, ],
    "Testing for Good XSLTing of '$fn_base'",
);

In order to test XHTML documents for equivalency.

Vered-XML

I thought that Vered-XML (currently only available in the Perl Beginners' Site's repository was probably the most ad-hoc and tailored to match XML grammar I have ever created, and was used to convert the markup of the "Perl Elements to Avoid" page from hacky Website META Language markup to XML, so it will be easier to translate to different (human) languages and also somewhat easier to maintain and render. Currently it renders to DocBook/XML, which allows translation to many other formats.

So why did I call it Vered? Since it was so ad-hoc, and I didn't know what else it would be useful for, I wanted to give it an artsy name, and considered using a random Hebrew feminine name. Then I thought that "a rose by any other name would smell as sweer", and thought of calling it Rose, but translated to Hebrew it was "Vered", which is also a Hebrew feminine name. I still have to tie up some loose ends, but expect XML-Grammar-Vered at a CPAN mirror near you Real Soon Now™.

Vered-XML supports a subset of XHTML, as well as some specialised tags such as <pdoc /> (perldoc), <pdoc_f /> (perldoc -f), <cpan_self_mod /> (CPAN self link to module), etc. and contains a RELAX NG schema and an xsltproc/libxslt2-compatible XSLT 1.0 stylesheet. Here is an excerpt from the "Perl Elements to Avoid" source XML:

<item xml:id="slurp">
<info>
<title>Slurping a file (i.e: Reading it all into memory)</title>
</info>
<p>
One can see several bad ways to read a file into memory in Perl. Among them
are:
</p>
<code_blk syntax="perl">
# Not portable and suffers from possible
# shell code injection.
my $contents = `cat $filename`;
# Wasteful of CPU and memory:
my $contents = join("", &lt;$fh&gt;);
# Even more so:
my $contents = '';
while (my $line = &lt;$fh&gt;)
{
    $contents .= $line;
}
</code_blk>
<p>
You should avoid them all. Instead the proper way to read an entire file
into a long string is to either use CPAN distributions for that such as
<cpan_self_dist d="File-Slurp" /> or
<cpan_self_dist d="IO-All" />, or alternatively
write down the following function and use it:
</p>
<code_blk syntax="perl">
sub _slurp
{
    my $filename = shift;
    open my $in, '&lt;', $filename
        or die "Cannot open '$filename' for slurping - $!";
    local $/;
    my $contents = &lt;$in&gt;;
    close($in);
    return $contents;
}
</code_blk>
</item>

I'm not sure if Vered-XML will have much utility outside that use case, but I think that putting it on CPAN won't hurt much.

Improving perlresume.org

I filed a few issues and sent some pull requests for perlresume.org for some problems I ran into, and vti fixed or applied them promptly. Thanks!

Beginners' Sites

I invested more work into the Perl Beginners' site and also, after some encouragement from the good people on #vim began work on the Vim Beginners's site, which I publicised on my tech blog, on Reddit, and on some mailing lists. While in the past I used my own static site generator called "Latemp" based on Website Meta Language for sites like that, I used Jekyll this time, at least for the first prototype. I did so in order to evaluate it for use in a different site (The Israeli Linux Portal which is currently implemented using an awkward custom PHP-based system, that mostly just reads stuff out of XML data files and outputs a constant static HTML as XSLT.). While at first I was impressed with Jekyll, I then realised that it was quite opaque, often required many plugins to get some basic functionality and that one plugin I looked at lacked examples and was otherwise under documented.

There is a list of many more static site generators / offline content-management systems around (which does not include my own Latemp), and it may be easier to roll up something on my own based on Template Toolkit or its ttree. All this just confirms my (and Su-Shee's) suspicion that every self-respecting web programmer has written at least one.

Freecell Solver for JavaScript - ☺

I have been distracted for many hours trying to port Freecell Solver to JavaScript using the very cool Emscripten LLVM bitcode to JavaScript compiler. The mostly final result could still use some work, and is still under optimised (in large part due to what appear to be an Emscripten bug) but it's working.

The Emscripten wiki contains links to many, much more impressive, demos including a JavaScript version of the open-source graphical game "Me and My Shadow".

Two Anecdotes from ##programming

I enjoy chatting on ##programming (on Freenode) and along with all the rants about how much every popular programming langauge in existence sucks, and trying to help various students with their often badly indented and badly formatted homework code, there are often some insightful discussions by some smart people. I'd like to share two Perl-related anecdotes from there.

The first one was that someone there who has taken studying programming, before starting to study in university, seriously, and also read the Camel book and other books and resources about Perl, asked me to look at my code to see how readable it is. I referred him to the source code of HTML::Widgets::NavMenu, which will being an early code of mine (and still not fully modernised), was written in a modular manner, and he said he found it to be well-factored and yet very readable . He noted that he suspected that Perl would be less believed to "hard to understand" or "unreadable", if it sandardised on using my preferred style - Allman style - instead of its default style with the opening braces on the same line as the opening clause (e.g: if ( COND ) {).

The other anecdote was that someone else there, who is working in Visual Basic and Visual Basic for Applications for a living (for lack of good alternative for software development job prospects in his homeland), told us that he found the CPAN documentation to usually be superior than the MSDN (Microsoft Developers Network) one. I found it a little strange, because I recall finding the MSDN documentation to give detailed examples for almost every subroutine and a page of prose and explanation for each one, and reportedly large parts of it were also translated into some non-English languages by Microsoft. Then I showed him the "SYNOPSIS" sections of some of the .pm files in CPAN and he was even more impressed and said it was useful (because you can copy+paste it and tweak it), but naturally commented that not all .pm files had those.

Conclusion

I currently stand at 87 CPAN distributions, some of which a bit lacking, and I still have some in the pipe. One of them will require quite a lot of cleanup work to do, but it's not a high priority. I also have to juggle some other endeavours, and also recently started contracting for an Internet friend as a Drupal developer (and in the future possibly one for Moodle as well), which may consume even more of my time (but on the bright side pay a little money). It's no secret that I am not a big fan of PHP, but Drupal seems impressive so far, and hides away a lot of PHP's ugliness. It took me some time to get acquainted with it, so I hope now things will be smoother and easier to do. It seems that there's quite a bit of demand for converting sites into Drupal, setting up new sites, or maintaining them in Israel (and possibly elsewhere as well) so it is probably a useful skill to have under my belt.

Hope you enjoyed this hacktivity log, and see you next time. Byeeeee!

Leave a comment

About Shlomi Fish

user-pic An Israeli software developer, essayist, and writer, and an enthusiast of open/free software and cultural works. I've been working with Perl since 1996.