Data::Table - Weekly Travelling in CPAN
Distribution: Data-Table
Module version: 0.38
Main Contributors: Yingyao Zhou (EZDB), Guangzhou Zou
License: [perl]
Near the Christmas in 2022, I played a data engineering challenge called "Hanukkah of Data 5783". You can find it on https://hanukkah.bluebird.sh/5783 . Players can download data of either .CSV, .JSON or .SQLITE format.
The first* task is like this:
... And their phone number was their last name spelled out. I didn’t know what that meant, but apparently before there were smartphones, people had to remember phone numbers or write them down. If you wanted a phone number that was easy-to-remember, you could get a number that spelled something using the letters printed on the phone buttons: like 2 has “ABC”, and 3 “DEF”, etc."
* There is a "zeroth" task on the calendar year.
I didn't mind perish my skill on SQL, but these kinds of things immediately I thought of Perl, maybe there would be a subroutine like this:
sub phonebutton { return 2 if $_[0] =~ /[abc]/; return 3 if $_[0] =~ /[def]/; return 4 if $_[0] =~ /[ghi]/; return 5 if $_[0] =~ /[jkl]/; return 6 if $_[0] =~ /[mno]/; return 7 if $_[0] =~ /[pqrs]/; return 8 if $_[0] =~ /[tuv]/; return 9 if $_[0] =~ /[wxyz]/; }
But what tools in Perl ecosystem could help me?
I found the CPAN module Data::Table. It is a tool for .xls and .csv file.
I use Data::Table to solve all those challenges. Data::Table has a very nice cookbook by its main author and can be found on https://sites.google.com/site/easydatabase/ .
Spoiler Alert: The following is how I solved one of the "Hanukkah of Data 5783" challenge. Here I suggest you reading the Data::Table cookbook (it is really nicely formatted and organized!) and try the Hanukkah of Data(it is really fun!) on yourself.
Puzzle statement on Day 5: https://hanukkah.bluebird.sh/5783/5/
The point is to find a lady living in Queens Village and having a habit of buying a lot of pet food. There is a category of product started with "PET". So my strategy was looking for the top few people spending the greatest amount of money or buying largest amount of pet food over the course of time.
I had to combine three tables: the table of customer details (including name, phone number, birthday, living area, customer ID), the table of orders and the table of items on each item. To combine tables, there is usually a JOIN operation. Relational database users know there are LEFT_JOIN, RIGHT_JOIN, INNER_JOIN and FULL_JOIN(OUTER_JOIN). Data::Table supports these JOINs.
Use of subTable() makes the operations less consuming.
sort([header], Data::Table::NUMBER, Data::Table::DESC) helped me to look for those pet lovers.
Here is my code:
# Day 5 of "Hanukkah of Data 5783" use v5.30.0; use warnings; use Data::Table; use List::Util qw/sum/; my $t = Data::Table::fromCSV("noahs-customers.csv"); my $u = Data::Table::fromCSV("noahs-orders.csv"); my $v = Data::Table::fromCSV("noahs-orders_items.csv"); my $t_p = $t->match_pattern_hash( '$_{citystatezip} =~ /Queens Village/' ); say $t_p->nofRow; my $v_p = $v->match_pattern_hash( '$_{sku} =~ /^PET/' ); my $u_p = $u->subTable([0..$u->nofRow-1], ["orderid", "customerid"]); my $r = $t_p->join($u_p, Data::Table::INNER_JOIN, ['customerid'], ['customerid']) ->join($v_p, Data::Table::LEFT_JOIN, ['orderid'], ['orderid']) ->group(['customerid'], ['qty'], [ sum ], ['numofpetfood']); $r->sort('numofpetfood', Data::Table::NUMBER, Data::Table::DESC); say $r->elm(0, "name"), " ", $r->elm(0, "phone"), " ", $r->elm(0, "numofpetfood"); say $r->elm(1, "name"), " ", $r->elm(1, "phone"), " ", $r->elm(1, "numofpetfood");
The difference of amount of purchases between the first pet food frequent buyer and the second pet food frequent buyer is huge. So there is no doubt who the person the puzzle is looking for.
I have put my solution of each day as gists on GitHub.
Data::Table
Leave a comment