My Perl Conference in Amsterdam Takeaway

Inspired by domm, I'm posting my takeaway of the Perl Conference in Amsterdam (which I wanted to present in-house anyway):

  • First up, I was heavily impressed by the hospitality and friendliness of the organizers! They really created an environment to feel welcome - I'm sure, the great coffee and good sandwiches played an important role.
  • The organizers did a really good job in figuring out the schedule, allowing for up to four different talks in parallel, to always have talks relevant for me, while converging everyone in the main hall for must-see talks.
  • Damian Conway is working to get the Perl 6 class syntax to Perl 5 with Dios. For the syntax, he has to expand the Perl 5 parser, which can be done since 5.12/5.14 using C code, or by using Lukas Mai's Keyword::Simple using Perl code. The challenge is that either technique requires that you process the following Perl source code by yourself, and do the rewriting that you want to achieve. Parsing could be passed to the existing PPI module, which creates a syntax tree for you - and is therefore slow. To make it run fast, Damian created Keyword::Declare, which uses his new module PPR containing a single regexp grammar that can successfully parse any Perl source code! You have to see this 1600+ line beauty for yourself.
  • For such a regexp beast to be debugged without going bananas, you'll need Regexp::Debugger
  • Acme::LooksOfDisapproval finally allows me to express my feelings towards some of our codebase
  • if you ever need to build an OAuth2 server, Lee Johnson's Net::OAuth2::AuthorizationServer is the way to go
  • last time I checked, I didn't find any JSON Web Token implementations - by now, there are a few, including Crypt::JWT and domm's plack middleware Plack::Middleware::Auth::JWT
  • Leon Timmermans is working on a successor of Module::Build and Module::Build::Tiny: Dist::Build, not yet available on CPAN.
  • If you "just" want to compile Perl and install it into a non-standard location, use Perl::Build
  • if you use DBI, the module DBIx::LogAny will help you produce proper logs of your database activities
  • Larry Wall reminded us that it doesn't matter how long something took to complete, once it's done - think of the Tolkien books. And, that he was the only one who wasn't supposed to burn out during the 15+ years of Perl 6 development.
  • domm reminded me that I don't know any of the modern SQL like Common Table Expressions and Window Functions.
  • Bart Wiegmans showed how mere mortals can improve the Perl 6 JIT compiler
  • there are alternatives to cpanm, including cpm that installs in parallel
  • mst wrote opan as a way to have a "small" CPAN mirror with private distributions (a.k.a. darkpan)
  • metacpan now allows you to grep through all of CPAN with grepcpan
  • I haven't come across OpenAPI specifications so far, so thanks to Lee Johnson for pointing me at APIs.guru
  • Paul Evans showed his impressive work on the async and await keywords, implemented in Future::AsyncAwait, to finally make code using Futures neat. Cant' wait till this gets out of alpha...
  • Damian impressed us by doing quantum mathematics using Perl 6
  • Steven Little works on Moxie, a successor of his Moose object system, but this time suitable for inclusion in the Perl 5 core. Where Moose creates data structures for all the meta things it does, Moxie tries to achieve the same things with minimal startup and memory overhead, by introspecting what is already there. Looks very promising to me! I like the fact that constructors are now strict by default, and that appending a question mark to the name makes them optional - however I think it would be great to be able to specify the rw/ro attribute within the has statement.
  • For programs that grew out of shell scripts, IPC::Run3::Shell will make them feel more shell-script-like again
  • Flávio Glock made our heads explode when presenting his work on Perlito. He is able to compile Perl 5 and Perl 6 code to other languages, including Perl 6, Perl 5, Javascript and Java. It can even translate itself, so you can have a stand-alone Perl interpreter in a browser. Wow!
  • Sawyer presented the new features of Perl 5.26. Most important for me are stable lexical subroutines and indented here-docs. In addition to 5.24's stabilization of postfix dereferencing, it's time to upgrade at $work!
  • For database update handling, use domm's DBIx::SchemaChecksum or sqitch
  • Ovid reminded us that unittest code is real code, and needs to be cared for properly. Thus, use his Test::Class::Moose (we already do in many projects). He also showed his addon Test::Class::Moose::History to track test history over time, which hasn't made it to CPAN yet.
  • Finally, I'm satisfied with how my Adding Structured Data to Log::Any talk worked out - it actually seemed to make sense to some people. I even noticed a few pictures of my slides being taken - I wasn't expecting that... Now I'm impatiently waiting for the video to be uploaded to Youtube - thanks Sawyer for doing this enormous chore. After my talk, I was pointed to fluentd and graylog, which I'll have to look into now.

Thanks a lot for organizing the conference, Amsterdam.pm and AmsterdamX.pm, I'm looking forward to seeing you all in Glasgow.

Add Structured Data to Log::Any

NB: The proposed changes have been accepted, there is now a release candidate waiting for your feedback.

A few months ago, I proposed a few changes to Log::Any with the goal of

  1. extending the API between applications and Log::Any::Adapters to allow passing hashrefs in addition to the log message
  2. allow applications to specify localized context data for Log::Any to add to each log message in that scope
  3. allow system owners to globally specify a default adapter other than Null

To get the bigger picture, you might want to revisit the original blog post and the reply by dagolden.

Based on the follow-up discussion on- and offline, I've tried to come up with a pull request that implements the proposed changes. To give it a try, clone it, and use my sample adapter to produce some output:

$ git clone https://github.com/mephinet/Log-Any.git
$ git clone https://github.com/mephinet/perl-Log-Any-Adapter-Screen.git
$ git clone https://gist.github.com/c1e48e0a6d6bbd5949483e24aeac9961.git sample
$ perl  -ILog-Any/lib/  -Iperl-Log-Any-Adapter-Screen/lib/   sample/sample.pl

For you as an application developer, if this pull requests gets applied, you can add structured data to you log lines:

$log->info("program started",
    {progname => $0, pid => $$, perl_version => $]});

Legacy adapters will receive this data flattened and appended to the log message, whereas adapters with support for structured data will receive the hashref as is, and can directly process them (e.g., send them to an ElasticSearch).

Furthermore, programs will be able to specify localized context data by adding key-value-pairs to the $log->context hash:

$log->context->{directory} = $dir;
for my $file (glob "$dir/*") {
    local $log->context->{file} = basename($file);
    $log->warn("Can't read file!") unless -r $file;
}

When a log line is written, all context data is appended to the line (for legacy adapters) or merged into the hash (for structured data adapters).

Finally, by specifying the $LOG_ANY_DEFAULT_ADAPTER environment variable, you can override the default Null adapter.

For you as a Log::Any::Adapter developer, with this pull request applied, you can specify a structured method instead of implementing a info, warn, error, etc:

sub structured {
    my ($self, $level, $category, @args) = @_;
    # ... process and log all @args
}

The method will receive level and category (as strings), followed by all arguments that were passed to the log method, unaltered, in addition with all data that has been added to the $log->context.

Please let me know what you think of these proposed changes!

Our Adventures in Logging

NB: I've written a follow-up post

Before we start: this is my first post at blogs.perl.org. Awesome that you're reading it despite me being a noob! :)

Three years ago, I started to change the way I think about logging in applications. Before that, to me it was just printing lines into text files, in order to later read and grep my way through them.

Then, a friend pointed me to ElasticSearch and Kibana, with its infinite ways to search through, and visualize, large amounts of textual data. Due to the nature of my $work, I was well aware of the benefits of a full text search database, however, it soon became clear that text alone was only part of the deal. The real fun in ElasticSearch begins when you add structured data into it, allowing filtering, grouping, and get all businessy with pie charts and world maps and whatnot. This would be where you'd start to gain knowledge that wouldn't be available anywhere in text-based logfiles.

At around the same time was the Austrian Perl Workshop in Salzburg, where Tim Bunce held a great talk called "Application Logging in the 21st Century". Watch it now, I'll wait! It's worth your time...

Besides being really inspiring, he showed that I had all the components in place I would need to perform intelligent, context-aware logging. We had an ElasticSearch installation, Kibana running, and our applications produced tons of logfiles, so there was a real itch to scratch. We should be able to feel improvements in no time.

Then hit reality. We would have to rewrite 200.000 lines of application code to produce structured output, which would take our four-person-team a bazillion years to complete. Besides, how is that code supposed to look like, what mechanism should it use? Most code just called print wherever the programmer saw fit. Obviously a long way to go...

Pushed by the same friend again, we started to use Log::Any for a greenfield project, and after a few rounds of getting it wrong, it started to feel like the right thing to do: Just using $log->error, $log->warn and friends in our module code, and not caring about where the application wants the output to go to. So just let the application configure the Log::Any::Adapters.

Oh, did I mention that our system was split in over 500 applications? We started configuring the Log::Any::Adapter in situ, but soon realized that getting this consistently the way we wanted it to be was an impossible feat. And what's even worse, when you change a module to use $log->error, but it is used by some application that didn't set up the Log::Any stack (yet), the less-than-awesome default logger "null" will happily swallow all your error messages, never to be seen again.

So we got back to the drawing board, literally logging-targets.jpg - in order to find out where we expected our log output to end up in. Sounds stupid, but when you start to discuss that topic, the complexity unfolds in front of your weary eyes.

Once this was cleared, one of our colleagues implemented a module that would, upon import, sense the context the program was executed in (e.g. interactively, as a daemon, as a CGI, as a cronjob, as a unittest, you name it), and configure the Log::Any stack accordingly, as specified on the whiteboard. This was a tremendous step ahead, now the application just needed to add a single line, and logging would happen exactly the way we wanted it to be. Mission accomplished.

Still, no structured data to be seen anywhere, still just plain old stupid lines of text. The Logstashy way would probably be to write parsers in Ruby to extract the structured data out of our logged text. However, that feels really wrong to me - writing parsers for unstructured plain text. This might be feasible for well-known log formats like Apache's access logs, but not for whatever comes out of our 200.000 lines of legacy application code happily printing into whatever log file happens to be around.

Instead, what we really want to do is specify additional context data in the spot where we perform the logging, and have the Log::Any adapters figure out what to do with it. So instead of:

$log->errorf('Failed to find element %s in file %s', $elem, $file);

I'd prefer to write:

$log->error('Failed to find XML element', {element => $elem, file => $file, pid => $$, program => basename($0), weather => 'shiny'});

It hold the same information, but now has structured data which can directly be inserted into ElasticSearch, giving you the shiniest histogram ever drawn of the top missing XML elements per zodiac. Since we were already using our own implementation of a Log::Any::Adapter, it shouldn't be difficult to get this to work, right? Wrong, because before your adapter even sees a log line, Log::Any::Proxy goes ahead and flattens all your arguments into a single string.

In our desperation, we created a local fork of Log::Any::Proxy that refrains from doing so - but now we're doomed to always use our Proxy and our adapter, so this is not an agreeable situation.

Last week, the same friend again showed us a small Go program he'd developed for us. I was taken away by the Go way of logging structured data (actually, the Logrus way) - if you don't know it, take a look at Logrus' withFields. Now this is even ways cooler than what I dreamed up. Now with that I would be able to say:

use Log::Any '$log';
$log->context->{pid} = $$;

sub init {
    $log->context->{weather} = 'shiny';
}

sub parse_xml {
    my ($fn) = @_;
    local $log->context->{file} = $fn;
    ...
    $log->error('Failed to find XML element', {element => $elem});
    ...

Is it possible that this idea didn't yet come to anyone in the Perl community? Of course not, as I found out upon researching: Log4Perl has something it calls Mapped Diagnostic Context - so no reason the be green-eyed towards Go.

However, I failed to find this in the Log::Any world. Am I missing something? If not, then it's time to write the first Log::Any Enhancement Proposals this small world has ever seen:

LAEP 1: Pass hashref as last argument of log functions on to supporting adapters

Extend Log::Any::Proxy to know whether his adapter is aware of hashrefs. If it isn't, then continue to work as today. If it is, pass the last argument, if it is a hashref, on to the adapter, so it can process it as it pleases.

Adapters would need to register their ability with the Log::Any stack - I'm sure we'll find a pretty way to do so.

LAEP 2: Expose an API to Log::Any for modules to add (localized) context data

Add an API to Log::Any that exposes a way to add, modify and remove context data. Make sure you can add in a localized way, so that you don't have to remove manually at the end of a block. For every call to a log function, add all the context data to the log line. If the adapter in use is hashref-aware, merge the context and the log line's hashref before passing it to the adapter. If the adapter is not hashref-aware, treat the context like it were specified in the log line, i.e., Data::Dumper its key-value-pairs onto the end of the log line.

LAEP 3: Change the default Adapter from Null to Stderr

I don't know what the original intention was in choosing this default, but it really bit our ass several times since we started to use Log::Any. Once you have a large enough codebase with many entry points (i.e., applications) sharing the same modules, starting to use Log::Any in any of your modules before all your applications have started to setup a Log::Any stack is the only way to assure that you don't start missing log output. If I would have known this beforehand, this would have been a no-go for Log::Any. To me, logging to Stderr would be the POLA thing to do, unless configured to do otherwise.

So long, thanks for staying with me for this long post. I hope I was able to get my point across - now I'm really curious on what you think regarding these three proposals. Please go ahead and use the comments section. If the general feeling toward this is positive, and if the Log::Any folks are in favor, I might even try to come up with a pull request for the ideas... Future me... Enjoy your weekend.