Perl Weekly Challenge 110: Valid Phone Numbers and Transposed File

These are some answers to the Week 110 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a couple of days (May 2, 2021). This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

Task 1: Valid Phone Numbers

You are given a text file.

Write a script to display all valid phone numbers in the given text file.

Acceptable Phone Number Formats:

+nn  nnnnnnnnnn
(nn) nnnnnnnnnn
nnnn nnnnnnnnnn

Input File:

0044 1148820341
 +44 1148820341
  44-11-4882-0341
(44) 1148820341
  00 1148820341

Output

0044 1148820341
 +44 1148820341
(44) 1148820341

This is obviously typically a job for regular expressions (or regexes). I will not even try to consider a language or solution not using regular expressions. I will not use a separate text file but simulate it with an array of strings or some other means.

Valid Phone Numbers in Raku

Please remember that Raku’s regexes are trying to renew the subject and have some differences with the traditional Perl or Perl-compatible regexes. Among other things, spaces are usually not relevant in a regex pattern (unless you use an option to force it).

use v6;

my @tests = " 0044 1148820341 42 ", "  +44 1148820342 abc", 
            " 44-11-4882-0343 ", " (44) 1148820344  ", " 00 1148820345";

my $ten-dgts = rx/\d ** 10/;
for @tests -> $str {
    say ~$0 if $str ~~ / ( [ \d ** 4 || '+' \d\d || \( \d\d \) ] \s+ <$ten-dgts> ) /;
}

To make things clearer, the regex above could be rewritten more clearly as:

(                 # Capture content within  poarens
  [               # group items within the [] alternative
    \d ** 4 ||    # Four digits or...
    '+' \d\d ||   # + sign and 2 digits, or ..
    \( \d\d \) ]  # two digits within parentheses
  ]               # end of the alternative
  \s+             # spaces
  <$ten-dgts>     # Ten-digits regex
)                 # end of capture

The above program displays the following output

$ perl phone.pl
0044 1148820341
+44 1148820342
(44) 1148820344
(39) 1148820344

Valid Phone Numbers in Perl

This is a port to Perl of the above Raku program. Note that we have included a test case in which thee are two phone numbers in the same input line.

use strict;
use warnings;
use feature "say";

# simulate a text file with an array of strings
my @tests = (" 0044 1148820341 42 ", "  +44 1148820342 abc", 
            " 44-11-4882-0343 ", " (44) 1148820344 foo (39) 1148820345", " 00 1148820346");

for my $str (@tests) {
    say $1 while $str =~ / ( (?: \d {4} | \+ \d\d | \( \d\d \)  ) \s+ \d{10} ) /gx;
}

This script displays the following output:

$ perl phone.pl
0044 1148820341
+44 1148820342
(44) 1148820344
(39) 1148820345

Valid Phone Numbers in Other Languages

Phone Numbers in Scala

We need to import the cala.util.matching.Regex core Scala package. Note that every backslash appears twice in the pattern of the program below. This is because in Java and Scala, a single backslash is an escape character in a string literal, not a regular character that shows up in the string. So instead of ‘\’, you need to write ‘\’ to get a single backslash in the string.

import scala.util.matching.Regex

object phoneNumber extends App {
  val pattern = "((?:\\d{4}|\\+\\d\\d|\\(\\d\\d\\))\\s+\\d{10})".r
  val tests = Array(
    " 0044 1148820341 42 ",
    "  +44 1148820342 abc",
    " 44-11-4882-0343 ",
    " (44) 1148820344  (33) 1148820345",
    " 00 1148820346"
  );
  for (str <- tests) {
    if (pattern.unanchored.matches(str)) {
      println((pattern findAllIn str).mkString(", "))
    }
  }
}

Output:

0044 1148820341
+44 1148820342
(44) 1148820344, (33) 1148820345

Phone Numbers in Python

This program uses the re core Python package:

import re 

tests = ("foo 0044 1148820341 42", "xyz +44 1148820342 abc", "44-11-4882-0343", " (44) 1148820344  ", "00 1148820345")

for str in tests:
    match = re.search("(\d{4}|\+\d\d|\(\d\d\))\s+\d{10}", str)
    if (match):
        print (match.group())

Output:

$ python3 phone.py
0044 1148820341
+44 1148820342
(44) 1148820344

Phone Numbers in Awk

Awk was the first programming language to include regular expressions, even before Perl, so it was an obvious guest language candidate for this task. I had a bit of trouble to get it to work properly because, for some reason, the \d and [:digit:] character classes did not work properly on the platform where I tested it (although they’re supposed to be part of the awk language). I used [0-9] instead, which is a quite simple solution, but I wasted quite a bit of time before I figured why it did not work as I expected. Here, we’re using a shell pipe with an awk one-liner:

$ echo '
0044 1148820341
+44 1148820342
44-11-4882-0343
(44) 1148820344
00 1148820346
' | awk '/([0-9]{4}|\+[0-9]{2}|\([0-9]{2}\))\s+[0-9]{10}/ { print $0 }'
0044 1148820341
+44 1148820342
(44) 1148820344

Phone Numbers in Julia

No need to import a dedicated library in Julia, since regexes are built into the language.

tests = ["foo 0044 1148820341 42", "xyz +44 1148820342 abc", 
         "44-11-4882-0343", " (44) 1148820344  ", "00 1148820345"]
pattern = r"(\d{4}|\+\d\d|\(\d\d\))\s+\d{10}"

for str in tests 
    m = match(pattern, str)
    if (! (m === nothing)) 
        println(m.match)
    end
end

Output:

    $julia phone.jl
    0044 1148820341
    +44 1148820342
    (44) 1148820344

Phone Numbers in Ruby

For some reason the \d character class and the \+ literal plus sign don’t seem to work on my Ruby installation, although they should if I understand the documentation correctly. So, I used the [0-9] and [+] character classes instead.

tests = ["foo 0044 1148820341 42", "xyz +44 1148820342 abc", 
         "44-11-4882-0343", " (44) 1148820344  ", "00 1148820345"]
pattern = %r{((\d{4}|\+\d{2}|\(\d{2}\))\s+\d{10})}
for str in tests
    match = str.match(pattern)
    if match then
        print(match[0], "\n")
    end
end

Output:

0044 1148820341                                                                                                                               
+44 1148820342                                                                                                                                
(44) 1148820344

Phone Numbers in Rust

Here, I have chosen to use a single string containing several phone numbers as input and check that we can extract several valid phone numbers from that input string.

use regex::Regex;

fn main() {
    let pattern = Regex::new(r"((\d{4}|\+\d{2}|\(\d{2}\))\s+\d{10})").unwrap();
    let test = "foo 0044 1148820341 42 xyz +44 1148820342 abc 
        44-11-4882-0343 (44) 1148820344 00 1148820345";
    for matches in pattern.captures_iter(test) {
        println!("{:?}", &matches[0]);
    }
}

Output:

"0044 1148820341"
"+44 1148820342"
"(44) 1148820344"

Task 2: Transpose File

You are given a text file.

Write a script to transpose the contents of the given file.

Input File

name,age,sex
Mohammad,45,m
Joe,20,m
Julie,35,f
Cristina,10,f

Output:

name,Mohammad,Joe,Julie,Cristina
age,45,20,35,10
sex,m,m,f,f

For practical reasons, I will not use an external file but simulate it in various ways.

Transpose File in Raku

We simulate the input file with an array of strings. The program takes the @input array of strings, reads each line in turn (as we would do with an actual file), split each line and on commas, and store the individual items in a @transposed array of arrays. At the end, we just need to output the rows of the @transposed array.

use v6;

my @input = <name,age,sex Mohammad,45,m 
             Joe,20,m Julie,35,f Cristina,10,f>;

my @transposed;
for @input -> $in {
    my $i = 0;
    for $in.split(',') -> $str {
        push @transposed[$i], $str;
        $i++;
    }
}
for @transposed -> @line {
    say @line.join(',');
}

This program displays the following output:

$ raku ./transpose.raku
name,Mohammad,Joe,Julie,Cristina
age,45,20,35,10
sex,m,m,f,f

Transpose File in Perl

We simulate the input file with space-separated string. The construction of the @transposed array of arrays is following the same idea as in Raku.

use strict;
use warnings;
use feature "say";

# Note: input array simulated with a string
my $in_string = "name,age,sex  Mohammad,45,m 
         Joe,20,m Julie,35,f  Cristina,10,f";
my @input = split /\s+/, $in_string;
my @transposed;
for my $in (@input) {
    my $i = 0;
    for my $str (split /,/, $in) {
        push @{$transposed[$i]}, $str;
        $i++;
    }
}
for my $line (@transposed) {
    say join ',', @$line;
}

This program displays the following output:

$ perl  transpose.pl
name,Mohammad,Joe,Julie,Cristina
age,45,20,35,10
sex,m,m,f,f

Transpose File in Awk

We pipe the input to the awk program standard input.

BEGIN{ 
    FS = "," 
}
{ table[0,NR] = $1 }
{ table[1,NR] = $2 }
{ table[2,NR] = $3 }
{ max = NR }
END {
    for (i = 0; i < 3; i++) {
        for (j = 1; j < max - 1; j++) printf "%s,", table[i,j]
        printf "%s\n", table[i,max-1]
    }
}

This is an example run:

$  echo 'name,age,sex
> Mohammad,45,m
> Joe,20,m
> Julie,35,f
> Cristina,10,f
> ' | awk -f transpose.awk
name,Mohammad,Joe,Julie,Cristina
age,45,20,35,10
sex,m,m,f,f

Wrapping up

The next week Perl Weekly Challenge will start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, May 9, 2021. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.