Perl Weekly Challenge 017: Ackermann Function and URL Parsing

By E. Choroba on July 19, 2019 10:55 PM under perl-weekly-challenge

Ackermann Function

A(m, n) = n + 1                  if m = 0
A(m, n) = A(m - 1, 1)            if m > 0 and n = 0
A(m, n) = A(m - 1, A(m, n - 1))  if m > 0 and n > 0

I know that Perl6 supports multisubs, but when I see a function definition of this kind, I always think Erlang, where you get pattern matching and multisubs by default. Here’s how it looks:

-module(ackermann).
-export([ackermann/2]).

ackermann(0, N) ->
    N + 1;
ackermann(M, 0) ->
    ackermann(M - 1, 1);
ackermann(M, N) ->
    ackermann(M - 1, ackermann(M, N - 1)).

Perl solution is a bit less straightforward.

sub ackermann {
    my ($m, $n) = @_;
    return $n + 1               if 0 == $m;
    return ackermann($m - 1, 1) if 0 == $n;
    return ackermann($m - 1, ackermann($m, $n - 1))
}

It’s not so easy, though. ackermann(3, 4) gives several warnings of the following kind:

Deep recursion on subroutine "main::ackermann" at ...

We know that the Ackermann function recurses deeply, so we can safely ignore the warning. Let’s just place

    no warnings 'recursion';

after the first return line. But that’s just half of the story. While recursing many times, it also repeats the same calculations many times. For the 3, 4 case mentioned above, it calculates ackermann(1, 1) (and 3 more) 114 times! It makes it very slow, ackermann(3, 9) takes about 3 seconds on my machine.

I’ve already shown a solution to this problem: Memoize (see Perl Weekly Challenge 013). Adding

use Memoize;
memoize('ackermann');

before the subroutine declaration speeds the computation to about 0.057 secs. ackermann(4, 1) accelerates from 16 mins 15 secs to 0.775 secs. But running ackermann(4, 2) with memoize ends in the system swapping after 1 min 30 secs. Ackermann function is far from practical, at least for greater m’s.

URL Parsing

My first thought was to reach for a module. Someone has already read the RFC’s and many people have tested the solution. I tried URI::URL I’d already had good experience with. But alas, the example URL doesn’t conform to the definition (not even the one given in Wikipedia) because “subschemes” aren’t standard.

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use URI::URL;

for my $url (
    'http://choroba:s6cr6t@www.perl.org:80/index.asp?x=12#id',
    'https://127.0.0.1/',
    'ftp://[1:2:3:4:5:6:dead:BEEF]',
    'jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1',
) {
    my $url_o = 'URI::URL'->new($url);
    for my $component (qw( scheme userinfo host port path query fragment )) {
        if ($url_o->can($component)) {
            my $value = $url_o->$component;
            say "$component:\t$value" if defined $value;
        }
    }
}

So, I had to parse the URL myself. When I hear “parse”, I think Marpa::R2. It makes it possible to declare a grammar and use it to parse strings. I didn’t follow all the details, so I simplified the username, hostname, etc. to be just strings while in reality their format would have been probably more restrictive. It shouldn’t be so hard to update the grammar with the correct constraints, though.


      #! /usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use Marpa::R2;

my $dsl = << '__DSL__';

:default ::= action => [name,values]
lexeme default = latm => 1

URL            ::= Scheme (':')
                   MaybeAuthority MaybePath MaybeQuery MaybeFragment
                                                        action => build
Scheme         ::= SchemeName
SchemeName     ::= letter SchemeBody                    action => concat
SchemeBody     ::= scheme_char SchemeBody               action => concat
                 | scheme_char                          action => ::first
MaybeAuthority ::=                                      action => ::undef
MaybeAuthority ::= ('//') MaybeUserInfo Host MaybePort  action => host
MaybeUserInfo  ::=                                      action => ::undef
MaybeUserInfo  ::= UserInfo ('@')                       action => ::first
UserInfo       ::= UserName MaybePassword               action => userinfo
UserName       ::= String
MaybePassword  ::=                                      action => ::undef
MaybePassword  ::= (':') Password                       action => ::first
Password       ::= String
Host           ::= Hostname
                 | ('[') IPv6 (']')
IPv6           ::= Hex ':' Hex ':' Hex ':' Hex ':'
                   Hex ':' Hex ':' Hex ':' Hex          action => concat
Hostname       ::= String                               action => ::first
MaybePath      ::= Path                                 action => path
Path           ::=
Path           ::= PathString                           action => ::first
PathString     ::= '/' String                           action => concat
MaybeQuery     ::=                                      action => ::undef
MaybeQuery     ::= ('?') Query                          action => query
Query          ::= QString
MaybeFragment  ::=                                      action => ::undef
MaybeFragment  ::= ('#') Fragment                       action => fragment
Fragment       ::= String
MaybePort      ::=                                      action => ::undef
MaybePort      ::= (':') Port                           action => port
Port           ::= Num
String         ::=                                      action => empty
String         ::= char String                          action => concat
                 | char                                 action => ::first
QString        ::= anychar QString                      action => concat
                 | anychar                              action => ::first
Num            ::= digit Num                            action => concat
                 | digit                                action => ::first
Hex            ::= hex Hex                              action => concat
                 | hex                                  action => ::first

anychar     ~ [\S]
letter      ~ [a-zA-Z]
scheme_char ~ [a-zA-Z+\-.]
char        ~ [\w.]
digit       ~ [0-9]
hex         ~ [0-9a-fA-F]

__DSL__

sub none     {}
sub empty    { "" }
sub host     { assign(host     => $_[0], $_[2]) }
sub port     { assign(port     => @_) }
sub query    { assign(query    => @_) }
sub fragment { assign(fragment => @_) }
sub path     { $_[0]{path} = $_[1] // "" }
sub concat   { join "", @_[ 1 .. $#_ ] }
sub userinfo { $_[0]{username} = $_[1][1], $_[0]{password} = $_[2][1] }
sub build    { $_[0]{scheme} = $_[1][1]; $_[0] }
sub assign   { $_[1]{ $_[0] } = $_[2][1] }

my $grammar = 'Marpa::R2::Scanless::G'->new({source => \$dsl});
for my $url (
    'http://choroba:s6cr6t@www.perl.org:80/index.asp?x=12#id',
    'https://127.0.0.1/',
    'ftp://[1:2:3:4:5:6:dead:BEEF]',
    'jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1',
) {
    say "URL: $url";
    my $recce = 'Marpa::R2::Scanless::R'->new({
        grammar           => $grammar,
        semantics_package => 'main'});
    $recce->read(\$url);
    my $struct = ${ $recce->value };
    for my $key (sort keys %$struct) {
        say "$key:\t$struct->{$key}";
    }
    say "";
}

You can see that the grammar consists of two kinds of rules. The ones using ::= are the rules of the context free grammar, called G1 or structural by Marpa. The ~ is used in L0 or lexical rules. If several rules share the LHS non-terminal, you can use | instead of the ::= to denote alternatives (not possible for empty rules, though). It’s also possible to use || which means an alternative with a lower priority.

Each rule can be assigned an action (there’s a default action defined at the top) which can do anything you like with the matching RHS phrases. In our example, we build a simple hash that collects the various parts of the URL.

I constructed the above grammar on the basis of the Wikipedia page. It parses the first three URLs correctly:

URL: http://choroba:s6cr6t@www.perl.org:80/index.asp?x=12#id
fragment:	id
host:	www.perl.org
password:	s6cr6t
path:	/index.asp
port:	80
query:	x=12
scheme:	http
username:	choroba

URL: https://127.0.0.1/
host:	127.0.0.1
path:	/
scheme:	https

URL: ftp://[1:2:3:4:5:6:dead:BEEF]
host:	1:2:3:4:5:6:dead:BEEF
path:	
scheme:	ftp

but fails with the jdbs:mysql example:

URL: jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1
Error in SLIF parse: No lexeme found at line 1, column 6
* String before error: jdbc:
* The error was at line 1, column 6, and at character 0x006d 'm', ...
* here: mysql://user:password@localhost:3306/pwc?profile=t
Marpa::R2 exception at ...

Fortunately, it’s easy to add subschemes to it: just extend the grammar with

Scheme         ::= SchemeName                           action => ::array
                 | SchemeName SubScheme                 action => ::array
SubScheme      ::= (':') SchemeName                     action => ::first

and change the build action to

sub build    {
    $_[0]{scheme}    = $_[1][0];
    $_[0]{subscheme} = $_[1][1] if defined $_[1][1];
    $_[0]
}

And the output is as expected.

URL: jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1
fragment:	h1
host:	localhost
password:	password
path:	/pwc
port:	3306
query:	profile=true
scheme:	jdbc
subscheme:	mysql
username:	user

2 comments

Tagged as:

ackermann, marpa, parser, perl weekly challenge, url

2 Comments

Mohammad S Anwar | July 20, 2019 12:17 AM | Reply

Sorry for the confusion about subschemes, I have updated the task couple of days ago after couple of members reported about it.

Grinnz | July 22, 2019 6:21 PM | Reply

You might check out URI::Nested to deal with subschemes.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About E. Choroba

I blog about Perl.

More info »

E. Choroba