PDL 2.063_01 released

There have been a couple of developments in PDL since the last announcement on here I could find, from 2013. To hypersummarise: 64-bit indexing, native complex number support, automatic pthreading using all available CPU cores, faster installation thanks to parallel-building, memory-mapped data, repository hosted on GitHub, easy to use "with" Inline. Returning you to the announcement:

PDL 2.063_01 has just been released. Notable changes since 2.062:


  • Various API changes (see below)
  • Improvements to $MACRONAME() handling including that arguments …

graphql-perl - plugin to make GraphQL "just work" with Mojo publish/subscribe functionality

GraphQL is the new, shiny way to do APIs. It minimises number of round-trips for clients to query what need. But what about real-time updates? How can we cut down the time needed for clients to get new information? Are we forcing them to constantly poll? That seems expensive and also slow.

GraphQL's official standard now includes "subscriptions". The obvious transport for that is WebSockets, and the de facto standard for that is Apollo GraphQL's subscriptions-transport-ws.

Memoising - standardisation of normalisation

Hopefully, the sesquipedalian polysyllabalisation of the title will have made your eyes glaze over. Now to wake you up: MASSIVE PERFORMANCE GAINZ!

Memoising, as any fule kno, is storing answers after they're calculated, against each set of inputs, so you don't have to keep re-calculating the same thing. If the calculating process is expensive, or even just more expensive than normalising (see below) and a lookup, this can make your programs run faster. Possibly much faster!

This is a thing that works great for computing factorials or Fibonacci numbers. It's a terrible idea for calculating the lengths of simple lists, since normalising would take longer than simply recalculating, and worse would be very unlikely to give repeated hits.

A hybrid answer for that case would be to only store the lengths, but that would still not add value in that specific case because if the normalised input is the answer, this guarantees that normalise+lookup will be slower than normalise=calculate. It does however give a hint that calculating(ha!) whether to normalise or not depends in part on the relationship between normalising and calculating, together with the likelihood of cache hits in that domain.

The memoising process works like so:

  • normalise the inputs: getting the right thing to store answers against
  • look to see if an answer for those inputs exists, and return it if so
  • calculate the answer
  • store that
  • return it

The problem here, such as it is, is that while the Perl module obviously has a facility for normalising inputs, I would say it's most orientated towards functions, rather than methods. My claim is that what's needed here is a convention, or even just a module (probably based heavily on Attribute::Memoize, to make this easy. It would probably just default to a method in the existing package called normalise.

A remaining problem would be what I am currently thinking of as "partial memoisation". This is inspired by considering SQL::Abstract's select method. This call:

$sql->select('tickets', '*', { id => 3 });

returns:

("SELECT * FROM tickets WHERE ( id = ? )", 3)

There is no absolutely general way of knowing from the outside that one should normalise based on the keys of the third parameter, then include the memoised output (here, the SQL) as part of the returns, together with the remaining stuff. Here, one would:

  • split the internals of the SQL-generator out into a memoisable method
  • make a _fractionate_where method that could be used by the above, plus the public select as below, to return in suitable order array-refs with the normalisable keys, and the values
  • have the normalise method use its knowledge of the relevant object state (e.g. quote_char) plus the "fractionated" keys of \%where to produce a normalised input
  • lookup / calculate the SQL using normal memoisation as already discussed
  • have the select method use that, then return the SQL plus the "fractionated" values

"Someone should" (which means I will) make an Attribute::Memoize::Defaults (or a better name - weigh in!) that implements the above, then pull-request its use onto SQL::Abstract.

Perl community, your thoughts are welcome!

JSON::Transform - transform JSON-able data structures without code

Version 0.01 of JSON::Transform is now on CPAN. It lets you express transformations of JSON-able data (i.e. data that is only hashes, arrays, simple scalars plus booleans) concisely and declaratively, without writing any other code.

Locations within the data structure are expressed using JSON Pointer (RFC 6901), and there are both user and automatically-system-set variables available that can be interpolated to make such locations be computed.

What is this for?

Be…

XML::Invisible - writing parsers without pain or code

Version 0.03 of XML::Invisible is now on CPAN. This lets you write parsers that produce XML-like Abstract Syntax Trees (AST), or actual XML documents, without writing any code. Why did I write it?

Parsing: a tiny introduction

Parsing is turning a text input, into semantically valuable output. It is often broken into the stages of lexing (turning the initial text into tokens - errors detected if invalid tokens given), parsing (structuring those tokens into ASTs - errors if structure wrong), later processing (doing something with the AST).

There are a number of ways of writing parsers generally. The most maintainable way is using as much of a declarative style as possible, usually by writing a grammar. There are various options in Perl, including Marpa, Parse::RecDescent, and Pegex. For each of these, you have to write a grammar (obviously), and write some code to handle the text inputs and parsing results.

Side note: I believe that completely separating the parsing process from semantically operating on the AST is good engineering in almost all circumstances. The only exception I have seen so far is the parsing example often used to compare parsing frameworks: data transformations like parsing JSON. For these, there is not a great deal of point in constructing an AST, then immediately converting into another very semantically similar data structure. For anything more semantically complex, I claim you should get your AST, then give it semantic meaning with other (hopefully highly declarative!) code.

Discussion of XML::Invisible

XML::Invisible follows the concept of Pegex, in having a single grammar that effectively does both lexing and (highly controllable) parsing, but does not do any later processing.

I wanted to get so declarative that I would be able to iterate adjusting the grammar, and inspecting parse outputs (i.e. an AST), without touching any actual code. This was because I was creating a new language for expressing arbitrary transformations of JSON-able data structures without writing code, and I knew that I wouldn't be done with the language design until the simple example transformations were parsing into a comprehensible AST. As I kept tweaking the grammar, I'd need to change the Pegex receiver to match the updated shape, which was a slow and painful process. I felt there must be a better way!

Having stumbled across Steven Pemberton's "Invisible XML" concept a little while ago, eventually the idea occurred of using that together with Pegex to achieve code-free parsing. XML::Invisible operates by creating a Pegex receiver generates ASTs that are semantically equivalent to simple XML documents (representing nodes with attributes, and nodes that are plain text). It [mis]uses a couple of existing annotations to Pegex grammars to control the shape of those XML documents.

Let's look at the fairly trivial grammar in Steven's draft specification, for parsing trivial arithmetic expressions, adapted for Pegex format. Rather than the end product (given in the POD for XML::Invisible), which is a showcase for the features of ixml, let's start with the most naive grammar, then iterate it until we get where we want to go:

expr: open arith close
open: /( LPAREN )/
close: /( RPAREN )/
arith: left op right
left: name
right: name
name: /(a)/ | /(b)/
op: sign
sign: /( PLUS )/

Put this in a file called grammar.pgx. Then install the necessary modules, and parse a trivial expression, (a+b), into an XML document:

cpanm XML::Invisible XML::Twig
perl -MXML::Invisible=make_parser -e 'print make_parser(join "", <>)->(q[(a+b)])->toStringC14N(1)' grammar.pgx | xml_pp

That gives this XML output:

<expr>
  <open>(</open>
  <arith>
    <left>
      <name>a</name>
    </left>
    <op>
      <sign>+</sign>
    </op>
    <right>
      <name>b</name>
    </right>
  </arith>
  <close>)</close>
</expr>

Let's turn the open and close, and also the sign, into attributes, using the + annotation, like this:

expr: +open arith +close
open: /( LPAREN )/
close: /( RPAREN )/
arith: left op right
left: name
right: name
name: /(a)/ | /(b)/
op: +sign
sign: /( PLUS )/

which gives:

<expr close=")" open="(">
  <arith>
    <left>
      <name>a</name>
    </left>
    <op sign="+"></op>
    <right>
      <name>b</name>
    </right>
  </arith>
</expr>

Now let's flatten out the arith entity, which doesn't add much, using the - annotation. This will also demonstrate what that does for flattening the op off the arith entity, which is to incorporate its attribute into the containing entity. The new grammar:

expr: +open -arith +close
open: /( LPAREN )/
close: /( RPAREN )/
arith: left -op right
left: name
right: name
name: /(a)/ | /(b)/
op: +sign
sign: /( PLUS )/

gives:

<expr close=")" open="(" sign="+">
  <left>
    <name>a</name>
  </left>
  <right>
    <name>b</name>
  </right>
</expr>

As a final tweak, let's pretend it's more useful to have the left entity's name be an attribute rather than a contained node, and to flatten the right entity's name out - achieved by adding two characters to the grammar, and changing no code at all. Grammar:

expr: +open -arith +close
open: /( LPAREN )/
close: /( RPAREN )/
arith: left -op right
left: +name
right: -name
name: /(a)/ | /(b)/
op: +sign
sign: /( PLUS )/

Result:

<expr close=")" open="(" sign="+">
  <left name="a"></left>
  <right>b</right>
</expr>

Relationship with XML Schema Definition (XSD)

It should be completely possible to take the Pegex grammar written for use with this module, and generate from it an XSD, since all the required information is available. However, I have not yet written any code to do so. Contributions and/or even informative GitHub issues (see link on the MetaCPAN page for the repo) are extremely welcome!

Future directions

One idea is to use the given grammar to allow turning the AST (i.e. XML document) back into a canonicalised version of the input document.

Acknowledgements

Thanks to the mighty Joel Berger for helping make this be more coherent!