Because I'm lazy.
I don't want to write separate instruction for args unpacking.
I don't want to manually check for mandatory params presence.
I don't want to implement named params validation.
I need multi subs so i won't have to invent artificial names for methods that do the same stuff.
The list goes on...
If I can do short and sweet "sub foo ( Str:D :$name! ) { ... }" then I'm happy. More convenient language won my heart. Simple as that.
I agree with your blog post. Trying to "sell" language without signatures nowadays is like trying to sell a car without electric motor starter. It does not matter how many good tutorials of "How to use hand crank to start the engine" you have. It. Will. Not. Sell.
]]>As for the syntax - please, please drop this Moose-ish fat arrow abuse. It does not construct proper Pair object like in Raku so essentially
has max_size => ( default => method { 20 } )
is pretty much the same as
has ( max_size => default ), method { 20 }
So I think that
has max_size => { default => method { 20 }, ... }
is more natural.
]]>Length: No one will take the effort to travel to remote location for single day workshops. This formula can work in some big cities, where local devs can hop on the bus in the morning and be back home before evening. Otherwise it's not good value for money.
Date: Making workshops single week after Riga conference backfired. Badly. Maybe you should do something unique, like winter workshops? That will fill the niche.
Name: Swiss Perl and Raku Workshops. Simple and descriptive. Or "Alpine Perl and Raku Winter Workshops with Hot Chocolate" :)
Cross technologies: No. Some good general talks are welcome. But talks dedicated for another languages are pointless on small workshops. That's domain of big conferences, where one can switch tracks to something different - out of curiosity or boredom. Workshops should be focused on subject.
Hacking: YES. But there should be some technical lead. Or at least list of proposed issues to resolve in Perl / Raku or ecosystem. Hackatons where people are just sitting in the same room doing their own stuff are usually boring.
What else? Blinds and armchairs. Seriously. Contrast was so bad this year that my eyes hurt after one hour. Do you have small cinemas that you can rent for the venue? And bigger name tags :)
Thanks again for making those workshops happen. See you next time.
]]>Consider common P5ism like:
my $name = 'John';
...
sub authenticate {
my ( $login, $password ) = @_;
}
...
# is correct data passed?
# or maybe user authenticates by his email?
authenticate( $name, ...);
While in P6 if I have:
sub authenticate ( Str:D :$login, Str:D :$password ) {
}
I will simply name credential variables consistently in whole codebase to get shorter method calls:
# I am sure that this is something that user will be authenticated by
my $login = 'John';
...
authenticate( :$login, ...)
I'm not saying that naming discipline is not possible in Perl 5. But Perl 6 immediately exposes naming discrepancies inside and outside of methods as code smell.
]]>And because of prison slang this term still means 'the vocabulary of "low or disreputable" people' (Wikipedia quote).
That's terrible marketing name.
I personally like "Perl 6". In my company a lot of developers know what Perl is and they associate Perl in general with both line noise and awesome parsing capabilities. And some of them (mostly PHP/JS developers) are familiar with new Perl 6 features like great unicode support or Rat calculations. So it's a strong brand that is getting positive feedback right now. Rebranding to Rakudo or 6lang means basically to throw away all the marketing done in the past years - blog posts, advent calendars, conferences - and start from scratch with building new brand recognition. Additionally causing a lot of confusion because if someone uses a "Rakudo language" and tries to find how to resolve some issue on the internet he will miss all of those existing Perl 6 articles.
Call to attempt() is outside Steps(). That is the very core of declarative programming - to separate process definition from its execution. But this leads to:
Unclear SUCCESS/FAILURE triggering. "If Wallet( ... hasminimumcredits .. ) step is not met does that mean FAILURE should be triggered?". In this case obviously not - if there was no bet no funds should be taken from wallet. But this is way too blurry.
Can you briefly describe how do you package Perl 5 ecosystem for _production_ environment? In my case it goes like this:
1. Build Perl itself. Holy cow, 55MB! Worst bundle ever for slim Docker containers, because it throws everything into one bucket. Develop dependencies (like Pod::Perldoc, Devel::, Benchmark, TAP::Parser::* or CPAN::*), build dependencies (ExtUtils::*, Module::*), runtime dependencies (Unicode stuff and pragmas) and of course tons of stuff no one really uses nowadays (NBDM, ODBM, SBDM interfaces, ptar, zipdetails binaries).
I truly hate this "distribution approach". Unbelievable bloatware.
2. Clone it and trim excessive fat. To do so I use manifest from perl-base debian package as a base of what's crucial and what's not. That gives me 7MB clean, minimalistic Perl.
3. On Perl from 1 (full installation) I do "cpanm Whatever Is Used In My Code". That does all the module testing and bumps ecosystem to enormous 300MB package.
4. Now the hard part is to install _runtime_ packages only on minimalistic Perl from 2. I haven't found a good way to do it. It's a mix of horrible hackery of symlinks (build dependencies must be "borrowed" from full installation) and partial parsing META files. I usually install package X used by my code, without dependencies. Then I try "use X" to see what it REALLY needs (META phases are often not correct or package is too old to have distinct phases configured) and install dependencies until X can be loaded and my code that uses X passes tests.
After few hours of pain I have 300MB "devel" ecosystem and 90MB "production" ecosystem that I can trust and deploy waaaay faster on multiple machines.
Do you go similar way? Or maybe have some cool tricks to debloat Perl itself and CPAN too-greedy modules dependencies chain?
]]>use v6;my $template = q{
Hi [VARIABLE person]!You can change your password by visiting [VARIABLE link] .
Best regards.
};my %fields = (
'person' => 'John',
'link' => 'http://example.com'
);
So we decided how our template syntax should look like and for starter we'll do trivial variables (although that's not very precise name because variables in templates are almost always immutable).
We also have data to populate template fields. Let's get started!
1. Substitutions
sub substitutions ( $template is copy, %fields ) { for %fields.kv -> $key, $value { $template ~~ s:g/'[VARIABLE ' $key ']'/$value/; } return $template; }say substitutions($template, %fields);
Yay, works:
Hi John!You can change your password by visiting http://example.com .
Best regards.
Now it is time to benchmark it to get some baseline for different approaches:
use Bench;my $template_short = $template;
my %fields_short = %fields;my $template_long = join(
' lorem ipsum ', map( { '[VARIABLE ' ~ $_ ~ ']' }, 'a' .. 'z')
) x 100;
my %fields_long = ( 'a' .. 'z' ) Z=> ( 'lorem ipsum' xx * );my $b = Bench.new;
$b.timethese(
1000,
{
'substitutions_short' => sub {
substitutions( $template_short, %fields_short )
},
'substitutions_long' => sub {
substitutions( $template_long, %fields_long )
},
}
);
Benchmark in this post will tests two cases for each approach. Our template from example is "short" case. And there is "long" case with 62KB template containing 2599 text fragments and 2600 variables filled by 26 fields. So here are the results:
Timing 1000 iterations of substitutions_long, substitutions_short... substitutions_long: 221.1147 wallclock secs @ 4.5225/s (n=1000) substitutions_short: 0.1962 wallclock secs @ 5097.3042/s (n=1000)
Whoa! That is a serious penalty for long templates. And the reason for that is because this code has three serious flaws - original template is destroyed during variables evaluation and therefore it must be copied each time we want to reuse it, template text is parsed multiple times and output is rewritten every time after populating each variable. But we can do better...
2. Substitution
sub substitution ( $template is copy, %fields ) { $template ~~ s:g/'[VARIABLE ' (\w+) ']'/{ %fields{$0} }/; return $template; }
This time we have single substitution. Variable name is captured and we can use it to get field value on the fly. Benchmarks:
Timing 1000 iterations of substitution_long, substitution_short... substitution_long: 71.6882 wallclock secs @ 13.9493/s (n=1000) substitution_short: 0.1359 wallclock secs @ 7356.3411/s (n=1000)
Mediocre boost. We have less penalty on long templates because text is not parsed multiple times. However remaining flaws from previous approach still apply and regexp engine still must do plenty of memory reallocations for each piece of template text replaced.
Also it won't allow our template engine to gain new features - like conditions or loops - in the future because it is very hard to parse nested tags in single regexp. Time for completely different approach...
3. Grammars and direct Actions
If you are not familiar with Perl 6 grammars and Abstract Syntax Tree concept you should study official documentation first.
grammar Grammar { regex TOP { ^ [| ]* $ } regex text { <-[ [ ] >+ } regex variable { '[VARIABLE ' $ =(\w+) ']' } } class Actions {
has %.fields is required;
method TOP ( $/ ) {
make [~]( map { .made }, $/{'chunk'} );
}
method text ( $/ ) {
make ~$/;
}
method variable ( $/ ) {
make %.fields{$/{'name'}};
}
}sub grammar_actions_direct ( $template, %fields ) {
my $actions = Actions.new( fields => %fields );
return Grammar.parse($template, :$actions).made;
}
The most important thing is defining our template syntax as a grammar. Grammar is just a set of named regular expressions that can call each other. On "TOP" (where parsing starts) we see that our template is composed of chunks. Each chunk can be text or variable. Regexp for text matches everything until it hits variable start ('[' character, let's assume it is forbidden in text to make things more simple). Regexp for variable should look familiar from previous approaches, however now we capture variable name in named way instead of positional.
Action class has methods that are called whenever regexp with corresponding name is matched. When called, method gets match object ($/) from this regexp and can "make" something from it. This "made" something will be seen by upper level method when it is called. For example our "TOP" regexp calls "text" regexp which matches "Hi " part of template and calls "text" method. This "text" method just "make"s this matched string for later use. Then "TOP" regexp calls "variable" regexp which matches "[VARIABLE name]" part of template. Then "variable" method is called and it checks in match object for variable name and "makes" value of this variable from %fields hash for later use. This continues until end of template string. Then "TOP" regexp is matched and "TOP" method is called. This "TOP" method can access array of text or variable "chunks" in match object and see what was "made" for those chunks earlier. So all it has to do is to "make" those values concatenated together. And finally we get this "made" template from "parse" method. So let's look at benchmarks:
Timing 1000 iterations of grammar_actions_direct_long, grammar_actions_direct_short... grammar_actions_direct_long: 149.5412 wallclock secs @ 6.6871/s (n=1000) grammar_actions_direct_short: 0.2405 wallclock secs @ 4158.1981/s (n=1000)
We got rid of two more flaws from previous approaches. Original template is not destroyed when fields are filled and that means less memory copying. There is also no reallocation of memory during substitution of each field because now every action method just "make"s strings to be joined later. And we can easily extend our template syntax by adding loops, conditions and more features just by throwing some regexps into grammar and defining corresponding behavior in actions. Unfortunately we see some performance regression and this happens because every time template is processed it is parsed, match objects are created, parse tree is built and it has to track all those "make"/"made" values when it is collapsed to final output. But that was not our final word...
4. Grammars and closure Actions
Finally we reached "boss level", where we have to exterminate last and greatest flaw - re-parsing.
The idea is to use grammars and actions like in previous approach, but this time instead of getting direct output we want to generate executable and reusable code that works like this under the hood:
sub ( %fields ) { return join '', sub ( %fields ) { return "Hi "}.( %fields ), sub ( %fields ) { return %fields{'person'} }.( %fields ), ... }
That's right, we will be converting our template body to a cascade of subroutines.
Each time this cascade is called it will get and propagate %fields to deeper subroutines.
And each subroutine is responsible for handling piece of template matched by single regexp in grammars. We can reuse grammar from previous approach and modify only actions:
class Actions { method TOP ( $/ ) { my @chunks = $/{'chunk'}; make sub ( %fields ) { return [~]( map { .made.( %fields ) }, @chunks ); }; } method text ( $/ ) { my $text = ~$/; make sub ( %fields ) { return $text; }; } method variable ( $/ ) { my $name = $/{'name'}; make sub ( %fields ) { return %fields{$name} }; } }sub grammar_actions_closures ( $template, %fields ) {
state %cache{Str};
my $closure = %cache{$template} //= Grammar.parse(
$template, actions => Actions.new
).made;
return $closure( %fields );
}
Now every action method instead of making final output makes a subroutine that will get %fields and do final output later. To generate this cascade of subroutines template must be parsed only once. Once we have it we can call it with different set of %fields to populate in our template variables. Note how Object Hash %cache is used to determine if we already have subroutines tree for given $template. Enough talking, let's crunch some numbers:
Timing 1000 iterations of grammar_actions_closures_long, grammar_actions_closures_short... grammar_actions_closures_long: 22.0476 wallclock secs @ 45.3563/s (n=1000) grammar_actions_closures_short: 0.0439 wallclock secs @ 22778.8885/s (n=1000)
Nice result! We have extensible template engine that is 4 times faster for short templates and 10 times faster for long templates than our initial approach. And yes, there is bonus level...
4.1. Grammars and closure Actions in parallel
Last approach opened a new optimization possibility. If we have subroutines that will generate our template why not run them in parallel? So let's modify our action "TOP" method to process text and variable chunks simultaneously:
method TOP ( $/ ) { my @chunks = $/{'chunk'}; make sub ( %fields ) { return [~]( @chunks.hyper.map( {.made.( %fields ) } ).list ); }; }
Such optimization will shine if your template engine must do some lengthy operations to generate chunk of final output, for example execute heavy database query or call some API. It is perfectly fine to ask for data on the fly to populate template, because in feature rich template engine you may not be able to predict and generate complete set of data needed beforehand, like we did with our %fields. Use this optimization wisely - for fast subroutines you will see a performance drop because cost of sending and retrieving chunks to/from threads will be higher that just executing them in serial on single core.
Which approach should I use to implement my own template engine?
That depends how much you can reuse templates. For example if you send one password reminder per day - go for simple substitution and reach for grammar with direct actions if you need more complex features. But if you are using templates for example in PSGI processes to display hundreds of pages per second for different users then grammar and closure actions approach wins hands down.
You can download all approaches with benchmarks in single file here.
To be continued?
If you like this brief introduction to template engines and want to see more complex features like conditions of loops implemented leave a comment under this article on blogs.perl.org or send me a private message on irc.freenode.net #perl6 channel (nick: bbkr).
]]>It has too much "goto" smell for my taste, where inner block flow control can jump over outer scope block.
Also "else" is a boolean word. And this may lead to confusion in situations like this.
my @x =(0): while (shift @x) { ... } else { ... } say 123;
Does it count as iteration if array was not empty, array item was checked but first block was never entered?
]]>But even such simple feature has more pitfalls that you can imagine. For example user has three contacts living in Europe/Warsaw, America/Phoenix and Australia/Sydney time zones.
The obvious validation is to exclude nonexistent days, for example user cannot select 2017-02-29 because 2017 is not a leap year. But what if he wants to send message at 2017-03-26 02:30:00? For America/Phoenix this is piece of cake - just 7 hours difference from UTC (or unix time). For Australia/Sydney things are bit more complicated because they use daylight saving time and this is their summer so additional time shift must be calculated. And for Europe/Warsaw this will fail miserably because they are just changing to summer time from 01:59:00 to 03:00:00 and 02:30 simply does not exist therefore some fallback algorithm should be used.
So for one date and time there are 3 different algorithms that have to be tested!
Unfortunately most of the time dependent code does not expose any interface to pass current time to emulate all edge cases, methods usually call time( ) or DateTime.now( ) internally. So let's test such blackbox - it takes desired date, time and time zone and it returns how many seconds are left before message should be sent.
package Timers;use DateTime;
sub seconds_till_send {
my $when = DateTime->new( @_ )->epoch( );
my $now = time( );
return ( $when > $now ) ? $when - $now : 0;
}
Output of this method changes in time. To test it in consistent manner we must override system time( ) call:
#!/usr/bin/env perluse strict;
use warnings;BEGIN {
*CORE::GLOBAL::time = sub () { $::time_mock // CORE::time };
}use Timers;
use Test::More;# 2017-03-22 00:00:00 UTC
$::time_mock = 1490140800;is Timers::seconds_till_send(
'year' => 2017, 'month' => 3, 'day' => 26,
'hour' => 2, 'minute' =>30,
'time_zone' => 'America/Phoenix'
), 379800, 'America/Phoenix time zone';
Works like a charm! We have consistent test that pretends our program is ran at 2017-03-22 00:00:00 UTC and that means there are 4 days, 9 hours and 30 minutes till 2017-03-26 02:30:00 in America Phoenix.
We can also test DST case in Australia.
# 2017-03-25 16:00:00 UTC $::time_mock = 1490457600;is Timers::seconds_till_send(
'year' => 2017, 'month' => 3, 'day' => 26,
'hour' => 2, 'minute' =>30,
'time_zone' => 'Australia/Sydney'
), 0, 'America/Phoenix time zone';
Because during DST Sydney has +11 hours from UTC instead of 10 that means when we run our program at 2017-03-25 16:00:00 UTC requested hour already passed there and message should be sent instantly. Great!
But what about nonexistent hour in Europe/Warsaw? We need to fix this method to return some useful values in DWIM-ness spirit instead of crashing. And I haven't told you whole, scarry truth yet, because we have to solve two issues at once here. First is nonexistent hour - in this case we want to calculate seconds to nearest possible hour after requested one - so 03:00 Europe/Warsaw should be used if 02:30 Europe/Warsaw does not exist. Second is ambiguous hour that happens when clocks are moved backwards and for example 2017-10-29 02:30 Europe/Warsaw occurs twice during this day - in this case first hour occurrence should be taken - so if 02:30 Europe/Warsaw is both at 00:30 UTC and 01:30 UTC seconds are calculated to the former one. Yuck...
For simplicity let's assume user cannot schedule message more than one year ahead, so only one time change related to DST will take place. With that assumption fix may look like this:
sub seconds_till_send { my %params = @_; my $when;# expect ambiguous hour during summer to winter time change
if (DateTime->now( 'time_zone' => $params{'time_zone'} )->is_dst) {
# attempt to create ambiguous hour is safe
# and will always point to latest hour
$when = DateTime->new( %params );
# was the same hour one hour ago?
my $tmp = $when->clone;
$tmp->subtract( 'hours' => 1 );
# if so, correct to earliest hour
if ($when->hms eq $tmp->hms) {
$when = $when->epoch - 3600;
}
else {
$when = $when->epoch;
}
}
# expect nonexistent hour during winter to summer time change
else {
do {
# attempt to create nonexistent hour will die
$when = eval { DateTime->new( %params )->epoch( ) };
# try next minute maybe...
if ( ++$params{'minute'} > 59 ) {
$params{'minute'} = 0;
$params{'hour'}++;
}
} until defined $when;}
my $now = time( );
return ( $when > $now ) ? $when - $now : 0;
}
If your eyes are bleeding here is TL;DR. First we have to determine which case we may encounter by checking if there is currently DST in requested time zone or not. For nonexistent hour we try to brute force it into next possible time by adding one minute periods and adjusting hours when minutes overflow. There is no need to adjust days because DST never happens on date change. For ambiguous hour we check if by subtracting one hour we get the same hour (yep). If so we have to correct unix timestamp to get earliest one.
But what about our tests? Can we still write it in deterministic and reproducible way? Luckily it occurs that DateTime->now( ) uses time( ) internally so no additional hacks are needed.
# 2017-03-26 00:00:00 UTC $::time_mock = 1490486400;is Timers::seconds_till_send(
'year' => 2017, 'month' => 3, 'day' => 26,
'hour' => 2, 'minute' => 30,
'time_zone' => 'Europe/Warsaw'
), 3600, 'Europe/Warsaw time zone nonexistent hour';
Which is expected result, 02:30 is not available in Europe/Warsaw so 03:00 is taken that is already in DST season and 2 hours ahead of UTC.
Now let's solve leap seconds issue where because of Moon slowing down Earth and causing it to run on irregular orbit you may encounter 23:59:60 hour every few years. OK, OK, I'm just kidding :) However in good tests you should also take leap seconds into account if needed!
I hope you learned from this post how to fake time in tests to cover weird edge cases.
Before you leave I have 3 more things to share:
use DateTime;BEGIN {
*CORE::GLOBAL::time = sub () { $::time_mock // CORE::time };
}$::time_mock = 123;
say DateTime->now( );
it won't have any effect due to sub folding.
BEGIN { # no empty signature *CORE::GLOBAL::time = sub { $::time_mock // CORE::time }; }$::time_mock = 123;
# parsed as time( + 10 )
# $y = 123, not what you expected !
my $y = time + 10;
because you cannot guarantee someone always used parenthesis when calling time( ).
$out.print('>', $_); for @seqs;
]]>I'm not FASTA expert, but my approach would be to use ">" as input separator instead of "\n". And then simply push 100 lines (=sequences) into each file through rotor:
# beware first empty element
writer($_) for $fasta.IO.lines( nl-in => '>').[1..*].rotor(100, :partial);
sub writer (@seqs) {
...
# restore sequence start character
$out.print('>', $_); for $seqs;
};
Of course TIMTOWTDI, however - from my experience - delimiter approach is waaaaaaay faster than buffer juggling.
]]>$ df -k -P Filesystem 1024-blocks Used Available Capacity Mounted on /dev/disk3 1219749248 341555644 877937604 29% / devfs 343 343 0 100% /dev /dev/disk1s4 133638140 101950628 31687512 77% /Volumes/Untitled map -hosts 0 0 0 100% /net map auto_home 0 0 0 100% /home map -fstab 0 0 0 100% /Network/Servers //Pawel%20Pabian@biala-skrzynka.local./Data 1951417392 1837064992 114352400 95% /Volumes/Data /dev/disk5s2 1951081480 1836761848 114319632 95% /Volumes/Time Machine Backups bbkr@localhost:/Users/bbkr/foo 123 1219749248 341555644 877937604 29% /Volumes/osxfuse
(if you see wrapped or trimmed output check raw one here)
And while this looks nice for humans, it is tough task for parser.
So let's use Perl 6 features to deal with this mess.
Capture command line output.
my ($header, @volumes) = run('df', '-k', '-P', :out).out.lines;
Method run executes shell command and returns Proc object. Method out creates Pipe object to receive shell command output. Method lines splits this output into lines, first goes to $header variable, remaining to @volumes array.
Parse header.
my $parsed_header = $header ~~ /^ ('Filesystem') \s+ ('1024-blocks') \s+ ('Used') \s+ ('Available') \s+ ('Capacity') \s+ ('Mounted on') $/;
We do it because match object keeps each capture, and each capture knows from and to position at which it matched, for example:
say $parsed_header[1].Str; say $parsed_header[1].from; say $parsed_header[1].to;
Will return:
1024-blocks 44 55
That will help us a lot with dynamic columns width!
Extract row values.
First we have to look at border between Filesystem and 1024-blocks column. Because Filesystem is aligned to left and 1024-blocks is aligned to right so data from both columns can occupy space between those headers, for example:
Filesystem 1024-blocks /dev/someverybigdisk 111111111111111 me@host:/some directory 123 222222222222 | | |<----- possible ----->| |<--- border space --->|
We cannot simply split it by space. But we know where 1024-blocks columns ends, so the number that ends at the same position is our volume size. To extract it, we can use another useful Perl 6 feature - regexp position anchor.
for @volumes -> $volume { $volume ~~ / (\d+) <.at($parsed_header[1].to)> /; say 'Volume size is ' ~ $/[0] ~ 'KB'; }
That finds sequence of digits that are aligned to the position of the end of the header. Every other column can be extracted using this trick if we know the data alignment.
$volume ~~ / # first column is not used by module, skip it \s+ # 1024-blocks, right aligned (\d+) <.at($parsed_header[1].to)> \s+ # Used, right aligned (\d+) <.at($parsed_header[2].to)> \s+ # Available, right aligned (\d+) <.at($parsed_header[3].to)> \s+ # Capacity, middle aligned, never longer than header <.at($parsed_header[4].from)> \s* (\d+ '%') \s* <.at($parsed_header[4].to)> \s+ # Mounted on, left aligned <.at($parsed_header[5].from)>(.*) $/;
Profit!
By using header name positions and position anchors in regular expression we got bomb-proof df parser for macOS that works with regular disks, pendrives, NFS / AFS / FUSE shares, weird directory names and different escaping schemas.
]]>
BTW: How did you start your career in bioinformatics? Was your primary education biology/genetics and you used Perl as a tool to solve your tasks, or was it the other way - you were a bored programmer that thought one day "it would be cool to sequence and save my hamster to hard drive"?
What is the bio knowledge threshold required to start work in bioinformatics company?