Extract Mail Adresses from CSV
What do you think about the code below?
I have a file containing information about people, where the fifth element of the tab separated line contains the mail address.
Each mail address appears multiple times, I need to print them unique.
my %mails;
open my $csv, '<', 'my.csv' or die $!;
while (<$csv>){
my $mail = ( split(/\t/) )[4];
$mails{$mail} = 1
}
say foreach sort keys %mails;
(and how do I indent code on blogs.perl.org?)
thx @Aristotle
The way you do it in HTML – with
<pre>
around<code>
. I fixed it for you, check your post.Not really what you are asking, but assuming you don't want to use some standard modules (Text::CSV) and don't want to validate your email addresses and that this isn't part of a larger program, you can do this right in the shell with:
cat my.csv | cut -f 4 | sort | uniq
Just an FYI.
Your example should work fine, as long as none of your fields ever contain a tab character. If they do, then in proper CSV they will be quoted in some manner, and sorting out things like that is what modules like Text::CSV are for. But for a one-off task where you know what your data looks like, there's nothing wrong with this.
Thanks for your answers. Very interesting!
What I especially like is
$mail = ( split(/\t/) )[4];
First split $_ at the tabs, then make it a list with (), then take the slice and assign to $mail.
At the moment I am confident that I will not have to think more than five seconds to understand it.
But I would not directly assign to the hash:
$mails{ ( split(/\t/) )[4] } = 1