The map/grep/sort dead end

The first thing i wanted to talk about is the map, grep and sort Functions. I think these functions are somewhat special. If someone look at these functions then we can say that these are concepts of a functional programming language.

All three functions take a code block. All three functions then goes through a list and uses that function to do specific thinks. "map" apply's a function to every element and returns a new list. "grep" uses the code block to specify a filter function. And with "sort" you can sort the list however you like it.

When i programmed in Perl i often used these functions. Some people probably hate them because they are hard to read and don't understand them at all. I think if you understood it once it really gets easier to read them, but the excessive use of "symbols" and that you have to read them from down to up still makes them harder to read as they should.

But now i am programming in C#, and at first it seemed that C# didn't had anything like map/grep/sort. But well just at the first look. Later someone will probably find LINQ that gives you an implementation like map/grep/sort in C#. And at the first look, it didn't seem so much different. It looks nearly the same.

As an example. You have an array of ints in Perl and you only want every number lower than 5. You will write something like this.

my @ints = ( 1, 2, 8, 3, 6, 7, 12, 3 );
my @low5 = grep { $_ <= 5 } @ints;

In C# it looks something like this

var ints = new int[] { 1, 2, 8, 3, 6, 7, 12, 3 };
var low5 = int5.Where(e => e <= 5).ToArray();

It looks nearly the same. The only thing to understand is what is the meaning of "e => e <= 5". That is called an Lambda-Expression, and what it does is absolutely the same as defining an anonymous function. Well i already seen a lot of people in the C# world that calls that the "Special LINQ-Syntax" because there think it is special to LINQ, but it is not. You can use them everywhere. Here Where() is just a method that takes an anonymous function. In Perl or JavaScript it would look something like this.

sub { my ($e) = @_; $e <= 5 }
function(e) { return e <= 5 }

So absolutely nothing special, right? Well, not directly the difference start to point out if you use more complex examples. Let's take the Schwartzian Transformation in Perl

@unsorted = ( "aaaa", "a", "aa" );
@sorted = map  { $_->[0] }
          sort { $a->[1] <=> $b->[1] }
          map  { [$_, length($_)] }

It sortes the words basend on there length. So @sorted will be "a", "aa", "aaaa" in the end. Now in C#

var unsorted = new string[] { "aaaa", "a", "aa" };
var sorted   = unsorted
    .Select(word => new { word, word.Length })
    .OrderBy(t => t.Length)
    .Select(t => t.word)

Well, i think it is by far better readable. The first thing. You can read it from up to down. And the next thing. Because of the better way to define Anonymous Functions with named parameter it is by far better readable than those "$_->[0]" or "$a->[1]" examples. Even if you don't knew anything about LINQ or Anonymous Functions. You don't need to be a genius to guess what "OrderBy(t => t.Length)" does. Probably Ordering by length? Yes!

But okay, there exists really a special LINQ Syntax. Because you could even write the above example as something like this in C#

var sorted = 
    from word in unsorted
    let length = word.Length
    orderby length
    select word;

So how is it now with readability? What do you think the following C# LINQ Example will output

string[] strings = {
    "A penny saved is a penny earned.",
    "The early bird catches the worm.",
    "The pen is mightier than the sword." 

var earlyBirdQuery =
    from sentence in strings
    let words = sentence.Split(' ')
    from word in words
    let w = word.ToLower()
    where w[0] == 'a' || w[0] == 'e'
        || w[0] == 'i' || w[0] == 'o'
        || w[0] == 'u'
    select word;

And this is just the beginning. Because LINQ have Functions for the most words that you already knew from SQL. Contains, First, GroupBy, Join, OrderBy, Union, Where, Any, Count, Min, Max and a lot more LINQ Queries can be really readable.

But i think that is still not the important part. What makes the difference is that all this is implemented as Iterators. So it only needs enough memory to calculate the next entry. You can even have an LINQ expression that reads TeraBytes from Data from Disk and it will still work. But even this is not the most important part. LINQ is just an interface and you can do a lot of different things with it. For example there exists LINQ to XML. So you can use LINQ to extract Data from XML. There also exists LINQ to SQL that creates SQL Queries from the LINQ expressions.

And on Top of that there also exists PLINQ. So what does the "P" stands for? It stands for "Parellel". But PLINQ just add one Method. You just have to add an ".AsParallel()" to the query and it automatically executes everything after it in multiple Threads so that all your CPU Processors will be used. Really nice, or?

So i think LINQ is extremely awesome. But there is one thing that bother me the most. Everything what you see about LINQ was only possible with the release of C# 3.0 that released around 2007. The C# development team needed to add a lot of functionality in C# that something like LINQ would be possible in C#. But Perl on the other side already had all that features since 1994 when Perl 5 released. So, why don't we have something like LINQ in Perl? I mean we have an excellent Book like "Higher Order Perl" that explains Functional Programming in Perl. That book teaches you everything that you need to knew to implement your own LINQ in Perl. And yet, nobody ever thought about it. Why?

I think the reason is that we already had map, sort and grep directly built into Perl. Absolutely nobody thought about wrapping it in objects and wrapping it in iterators. Why should someone do it? Because on the surface it looks as if you want to rebuild something that is already implemented in the language and it has no benefit at all. We already thought it was perfect. And thinking that something is perfect also means it is an dead end.


Mojo::Collection (part of the Mojolicious toolkit) might satisfy part of what you are describing here, namely the object-chain mechanism. I'm not sure that I would prefer a lambda syntax over Perl's anonymous subroutines tho.

That's one apples to oranges comparison if I've ever seen one (library vs. built-in functions).

> So, why don't we have something like LINQ in Perl?

> Absolutely nobody thought about wrapping it in objects and wrapping it in iterators.

That's a bold claim, are you trolling to be proved wrong? Have a look at CPAN. Off the top of my head: Data::CapabilityBased List::Gen Generator::Object Object::Iterate

There's more.

Leave a comment

About Sid Burn

user-pic I blog about Perl.