## As simple as possible...but no simpler

The first task of last week's Weekly Challenge

was to generate van Eck's sequence:

` 0, 0, 1, 0, 2, 0, 2, 2, 1, 6, 0, 5, 0, 2, 6, 5, 4, 0,...`

The first challenge is to understand just what van Eck's sequence is,

as the various online explanations are less than instantly helpful.

Van Eck's sequence is a list of integers starting at zero, where the next number

in the sequence is given by the distance between the current number

and the nearest preceding occurrence of that same number.

For example, if the current number at index *N* *(let's call it: Aₙ)* is 7,

then to compute the number at index *N+1* we look back through the

sequence for the most recent preceding occurrence of 7

(at some earlier index *M*). Then the next number in the sequence

is simply the distance between those two occurrences of 7.

Namely: *N - M*

The only complication is that, if there is *no* prior occurrence of the current number

anywhere in the preceding sequence, then the next number is simply zero.

That's less arbitrary than you might suppose: we can think of that zero value

as the distance from *Aₙ* to itself. That is, if there is no preceding occurrence,

then the only possible candidate for a "preceding" occurrence is just *Aₙ* itself,

so the distance in that case is *N - N*, which is zero.

Okay, so that's van Eck's sequence. Now how do we generate it in Raku?

If you've been following my recent blog posts then it probably won't

come as much of a surprise to learn that Raku has an appropriate

built-in tool that will let us accomplish the entire task a single

statement. That tool is the sequence operator.

The `...`

sequence operator takes an initial list of values,

followed by a code object (*i.e.* a block or subroutine),

followed by a termination condition:

` my @sequence = `

`@initial-list``, &code-obj ... $termination;`

It builds a sequence (literally an object of class `Seq`

), starting with the initial values,

then generates extra values after the initial list by calling the code object repeatedly

on the final elements of the initial list. Each time the code object is called, the value

it returns is appended to the sequence, until a value is returned that smartmatches

the termination condition, at which point the sequence is complete. For example:

* # initial code-obj termination*

` my `

`@odd-nums `` = 1, 3, 5, {$^n + 2} ... 999;`

` my `

`@N-powers `` = $N, $N², {$^x * $N} ... * > 1_000_000;`

` my `

`@fibonacci`` = 0, 1, 1, {$^a + $^b} ... ∞;`

These examples differ from van Eck's sequence in one important respect.

Each of them uses only one of the preceding values (or in the case of

Fibonacci, two of them) in order to compute the next. But the next number

in van Eck's sequence depends on both the immediate preceding value (*Aₙ*)

and the *entire* sequence before that...which must be searched to locate

an appropriate previous occurrence of *Aₙ*.

Naturally, Raku supports this kind of sequence too: if your code object uses

the special array parameter `@_`

(or indeed, any other variadic array parameter),

then it will be passed the entire preceding sequence every time.

So now all we need to do is build a code object (in this case: a subroutine)

that, when passed the entire preceding sequence in `@_`

, computes the next

van Eck's number. It might look like this:

` sub `

`next-vanEck's`` {`

` my $N = @_.end;`

` my $M = try ($N-1...0).first: {@_[$^m] == @_[$N]};`

` with $M { return $N - $M }`

` else { return 0 }`

` }`

The final index in the current sequence is given by `@_.end`

(which is like `$#_`

in Perl), so that's our current *N* position.

To find the nearest preceding identical number, we then walk backwards

from index *N-1* down to zero, looking for the first index (`$^m`

)

where the value at that index (*i.e.* `@_[$^m]`

) is equal to the value

at index *N* (*i.e.* `@_[$N]`

). Then we return *N-M* if *M* was found,

or zero otherwise.

By the way:

- That
`with $M {...}`

block is just a convenient shorthand

for:`if $M.defined { my $_ := $M; ...}`

- The
`try`

prefix on the search for*M*is there to handle the edge-case

where`$N`

is zero, in which case`$N-1...0`

would produce

a negative array index, which would raise an exception. - Yes, it's perfectly okay to use an apostrophe or hyphen in a subroutine name

(or in any other identifier) in Raku, provided it's surrounded

by normal identifier characters, so it can't be confused with

the start of a character string.

Once we have a code object that calculates the next van Eck's number,

then the entire van Eck's sequence is just:

` constant `

`vanEck's`` = 0, `

`&next-vanEck's`` ... ∞;`

It's fine to let the sequence run to infinity because `...`

generates

its values lazily: only computing as many elements as are actually needed

to meet a given request.

Once we have our constant sequence, we can ask for any particular index,

at which point `next-vanEck's` will be called as many times as necessary

to compute the requested value:

` say `

`vanEck's``[100]; # prints: 23`

That's certainly a clean, compact, uncluttered solution,

but it was not accomplished as advertised: "in a single statement".

However, the version above is only a starting point,

There are numerous improvements we could make to it

...by thinking and coding more Raku-ishly.

For a start, it's often a red flag when a piece of code iterates though

indexes and then looks up the corresponding array elements. Raku,

in particular, makes it just as easy to iterate those array elements directly

and then find the corresponding index if we need it (as we do here):

` # Version 2...`

` sub `

`next-vanEck's`` {`

` my $N = @_.end;`

` my $M = @_.head(*-1).first(@_.tail):end:k;`

` with $M { return $N - $M }`

` else { return 0 }`

` }`

In this version, we find the index *M* by taking all the elements of the list

prior to the last (`@_.head(*-1)`

), then searching for the first such element

that is equal to the final element (`.first(@_.tail)`

).

But, because we need the closest preceding match, we need to search

*backwards* through the prior elements, so we add an `:end`

adverb to tell

`.first`

to search from the end. Then, when we find a match, we need the

position at which that match was found (not the value that was found there),

so we add a `:k`

adverb to tell `.first`

to return the matching "key"

(*i.e.* the index where the match occurred) instead of the matching value.

This new way of locating the *M* index requires marginally less code,

and is less prone to off-by-one errors, but the real benefit is that it's

also significantly faster. Because it's iterating array elements instead

of array indexes, it doesn't have to do two explicit array look-ups

in order to compare every prior element with the final element.

There's another minor code optimization possible here. As we saw back

at the beginning of this exploration, the special case of returning a zero

if there's no *M* index is equivalent to setting *M* to *N* in that case.

If we do that, then we can just return *N-M*, without ever needing

to test for the special case. Like so:

` # Version 3...`

` sub `

`next-vanEck's`` {`

` my $N = @_.end;`

` my $M = @_.head(*-1).first(@_.tail):end:k `

`//`` `

$N`;`

` return $N - $M;`

` }`

If the `.first`

search fails to find a preceding match, it will return

an undefined value instead, so the trailing defined-or will simply

use the *N* value instead, in which case *N-M* will be zero, as required.

And, of course, having already reduced the code so much, the only reason

to persevere with the `$N`

and `$M`

variables is for self-documentation.

Otherwise, we could simply replace `$N`

with `@_.end`

,

and `$M`

with the new *head-first-tail* search expression, producing:

` # Version 4...`

` sub `

`next-vanEck's`` {`

` @_.end - (@_.head(*-1).first(@_.tail):end:k `

`//`` @_.end)`

` }`

At which point, we hardly need the named subroutine either.

Instead of:

` constant `

`vanEck's`

` = 0, `

`&next-vanEck's`` ... ∞;`

we could simply "inline" the subroutine body as a code block,

producing a genuine single-statement solution:

` constant `

`vanEck's`

` = 0, {@_.end-(@_.head(*-1).first(@_.tail):end:k`

`//``@_.end)}...∞;`

Job done!

Or is it? Sure, if we now write:

` say `

`vanEck's``[100]; # Should print: 23`

...we get 23 printed out, as expected.

But what happens if we write:

` say `

`vanEck's``[10_000]; # Should print: 14`

Well, we *do* get 14 printed out...eventually.

About a minute later. :-(

If we then tried `vanEck's``[100_000]`

, we'd have time

for a long, leisurely lunch break...at a restaurant an hour away.

And if we wanted `vanEck's``[1_000_000]`

, we would

quite literally have to come back for it next week.

So...we managed to make our solution as simple as possible,

but at the cost of truly woeful, completely non-scalable

runtime performance.

And, when you stop to think about it, it's not surprising that it runs

so poorly. To generate the next van Eck's number, our code has to look

through (potentially) every one of the *N-1* earlier numbers, comparing

each of them to the most recent number in the sequence. In other

words, computing a single van Eck's number takes *O(N)* operations,

so computing *N* of them must be *O(N²)*. Hence, over a minute to find

a mere 10 thousand numbers of the sequence, several hours to find the

first 100 thousand, and nearly seven days to find the first million.

Obviously, before we can reap the countless benefits of an unlimited supply

of van Eck's numbers, we're going to need to find a way to eliminate

that repeated *O(N)* search inside the code object.

Happily, that's not too difficult. There's a kind of algorithmic Lorentz contraction

that we can often apply to these kinds of situations: we can literally shrink

the time dimension...by expanding the space dimension. More prosaically,

we can trade more memory for extra speed.

The trick in this case is to permanently track (in a static variable) the index *M*

of the most recent prior instance of each number in the sequence so far.

Then, when we encounter that number again, we can perform a single look-up

on our variable to find the most-recent prior index for that number.

All we have to do then is update our remembered *M* value each time

(since we've just seen the same number again at a later index: *N*).

A `next-vanEck's` subroutine based on that approach would look like this:

` # Version 4...`

` sub `

`next-vanEck's`` {`

` state @M;`

` my $N = @_.end;`

` my $M = @M[@_.tail] `

`//`` $N;`

` @M[@_.tail] = $N;`

` return $N - $M;`

` }`

The `@M`

array tracks the most recent prior indexes of each previous value.

So, when we need that *M* value to compute the next number, now we simply

look up its location up directly: `@M[@_.tail]`

We then update the `@M`

array with the new most-recent position of

the current value: `@M[@_.tail] = $N`

Running the sequence operation with this version of the subroutine,

we can compute `vanEck's``[10_000]`

in about 15 seconds, instead of 60.

A fourfold improvement in performance is always gratifying,

but we can do *much* better than that.

We're only passing the entire preceding sequence in (as `@_`

) so we can

access the final value (`@_.tail`

) and the current value of *N* (`@.end`

)

at which it occurs. But we could, instead, simply pass in that final value directly.

And the value of *N* merely increments on each call, so we could track it

trivially in another state variable.

Which looks like this:

` # Version 5...`

` sub `

`next-vanEck's`` (`

**$Aₙ**) {

` state (`

**$N**, @M);

` `

**$N++;**

` my $M = @M[`

**$Aₙ**] `//`` $N;`

` @M[$Aₙ] = $N;`

` return $N - $M;`

` }`

Because the subroutine now takes a single argument `$Aₙ`

(yes, Raku allows subscripted alphanumerics in identifiers),

each time the sequence operator generates a new value,

it now passes the code object only the final value

of the sequence so far.

We also track the current index, by incrementing `$N`

each time the subroutine is called.

As before, we determine the *M* value (which is now: `@M[$Aₙ]`

),

then update it, then return the newly computed next value.

But, because the sequence operator no longer needs to pass the entire

sequence-so-far into the subroutine for every new number it generates,

the cost of each call to `next-vanEck's` drops significantly.

With this version of the subroutine we can compute `vanEck's``[10_000]`

approximately 100 times faster than previously, in about 0.15 seconds.

Or find `vanEck's``[100_000]`

in about 1.5 seconds.

Or `vanEck's``[1_000_000]`

in about 15 seconds.

That's not just vastly faster than the previous approach;

it's now a completely different order of complexity:

clearly *O(N)* linear time, rather than *O(N²)* quadratic.

This massive improvement in performance was not, however, free.

The `@M`

array is consistently about 97% as long as the sequence itself,

though slightly sparse, so it may only require about 85% as much space.

Nevertheless, this means that the memory footprint of this solution has

effectively doubled compared to version 1. In most situations, though, that's

a perfectly acceptable trade-off: memory is relatively cheap; time is priceless.

But could we also code this version in a single statement?

Of course, we can.

Once again, we simply inline the subroutine's code block:

` constant `

`vanEck's ``= 0, { state ($N, @M);`

` `

` `` $N++;`

` `

` `` my $M = @M[$^Aₙ] `

`//`` $N;`

` `

` `` @M[$^Aₙ] = $N;`

` `

` `` $N - $M;`

` `

` `` } ... ∞;`

Now, of course, if I'd *started* with that, just sprung it on you with

no warning and no explanation, then you'd be quite justified in

complaining that it's scary and incomprehensible and (gasp!) different.

New concepts and new approaches and new tools and new syntax

are always challenging. Especially all four at once.

But as you come to understand the power and expressiveness of Raku,

then perhaps a solution like that is not so scary, and you start to see

how it's possible to write code that looks very much like the original

description of the problem, in something very like the original notation

of the problem, and hence which is actually *more* comprehensible

to anyone who is already familiar with the task domain.

And, because there's always more than one way to do it in Raku,

it's still perfectly fine to write that final version as:

` constant `

`vanEck's`` = 0, `

`&next-vanEck's`` ... ∞;`

...hiding away the algorithm and the optimizations inside that subroutine.

And later changing them within the subroutine when you find a cleverer

approach, a better algorithm, or an even more efficient Lorentzian trade-off.

Both Perl and Raku cop a lot of flak for their unapologetically

*"More Than One Way To Do It"* philosophies. But, to me,

the flexible and transparadigm nature of the two languages

simply means that I'm never stuck with the inadequate concepts,

or the inefficient algorithms, or the inconvenient software tools,

or the intrinsic performance limitations that someone else

hard-wired into their *"One Right Way To Do It"*.

And, for me, that makes the choice of Raku...as simple as possible.

Great analysis and big push for the language itself. You have been inspirational as ever.

Damian's communication style (in person, in presentation, in writing, or in any other form that I've interacted with him) is always so clear. This is yet another example of why I love the Perl community: we're trying to do better than each other, in a way that brings the whole community up with us.

Keep up the good work!

You version 4 must be:

```

constant vanEck's = 0, {@_.end - (@_.head(*-1).first(@_.tail):end:k//(@_.end))}...∞;

```

Otherwise, we will get undefined error on `$N`.

I learned a lot reading this--thanks! Your explanations are so clear, and the improvements in speed you got from thinking and algorithm improvement makes the idea that perl6, which might be 30% slower than xyz language, moot. I'm trying to switch to perl6, and hope you keep up this quality of blogging!

Yes indeed!

Thanks for spotting that.

I've corrected it in the text.

Much appreciated!