Enter the Matrix ... with PDL

By Enkidu on March 19, 2020 10:58 AM under Numerical Perl

We interrupt this k-Means broadcast to bring you an important message about threading (the PDL kind, not the Perl kind - darn those overloaded terms!)

The Assignment

Take two vectors, x and y, and create a matrix C from a function of the values of each element pair, such that

$C_{ij}=\frac{1}{x_i-y_j}$

Cribbing from NumPy's outer function, that function looks like it hands you back an iterator over all pairs which you give to your function. PDL's outer primitive only does outer product, so close, but no cigar.

I know what. I'll ...

Don't loop!

A lot of you may have reached for a pair of nested for loops. Avoid that if you can. PDL's threading takes care of it and does it faster ... and it lets you write the function in one line. Here's an example stolen from 100 PDL Exercises

$ perldl

pdl> $PDL::doubleformat = '%4.2g'   # for pretty printing

pdl> $x = sequence(8)
pdl> $y = sequence(7) + 0.5

pdl> $c = 1/($x->dummy(1) - $y->dummy(0))
pdl> p $c
[
 [   -2     2  0.67   0.4  0.29  0.22  0.18  0.15]
 [-0.67    -2     2  0.67   0.4  0.29  0.22  0.18]
 [ -0.4 -0.67    -2     2  0.67   0.4  0.29  0.22]
 [-0.29  -0.4 -0.67    -2     2  0.67   0.4  0.29]
 [-0.22 -0.29  -0.4 -0.67    -2     2  0.67   0.4]
 [-0.18 -0.22 -0.29  -0.4 -0.67    -2     2  0.67]
 [-0.15 -0.18 -0.22 -0.29  -0.4 -0.67    -2     2]
]

A caveat

Don't tie yourself into knots over threading. There are those that take the admonition against looping as dogma and contort themselves to avoid it. Sometimes you may need to explicitly loop when it makes sense, but this case isn't one of them.

Why doesn't the matrix update

If I change the underlying vector, why do I have to recalculate the matrix? I thought I had to explicitly make a copy or sever to stop the values from flowing.

Well, to get the data flowing I needed the .= operator overloaded to do such a thing ... aaaaand it doesn't work with dummy dimensions. See Propagated assignments in Indexing

operator overload

From the documentation on outer it is possible to achieve the effects of outer product simply by threading over the "*" operator

Wait?!? outer is a convenience function? Who ARE these people and from what dimension do they hail that threading is so mundane? What else do they have hidden up their sleeves?

I'm not one for diving into the source code, but now I'm sorely tempted just to see the madness that lies within.

Extend the technique to higher dimensions

There are more dimensions than you have dreamt of in your philosophy, Horatio.

pdl> $x = sequence(8)
pdl> $y = sequence(7) + 0.2
pdl> $z = sequence(6) + 0.3

pdl> $c = 1/($x->dummy(1)->dummy(2)
- $y->dummy(0)->dummy(2)
+ $z->dummy(0)->dummy(1)
)
pdl> p $c->dims
8 7 6

pdl> p $c
[
 [
  [   10  0.91  0.48  0.32  0.24   0.2  0.16  0.14]
  [ -1.1    10  0.91  0.48  0.32  0.24   0.2  0.16]
  [-0.53  -1.1    10  0.91  0.48  0.32  0.24   0.2]
  [-0.34 -0.53  -1.1    10  0.91  0.48  0.32  0.24]
  [-0.26 -0.34 -0.53  -1.1    10  0.91  0.48  0.32]
  [ -0.2 -0.26 -0.34 -0.53  -1.1    10  0.91  0.48]
  [-0.17  -0.2 -0.26 -0.34 -0.53  -1.1    10  0.91]
 ]
 [
  [ 0.91  0.48  0.32  0.24   0.2  0.16  0.14  0.12]
  [   10  0.91  0.48  0.32  0.24   0.2  0.16  0.14]
  [ -1.1    10  0.91  0.48  0.32  0.24   0.2  0.16]
  [-0.53  -1.1    10  0.91  0.48  0.32  0.24   0.2]
  [-0.34 -0.53  -1.1    10  0.91  0.48  0.32  0.24]
  [-0.26 -0.34 -0.53  -1.1    10  0.91  0.48  0.32]
  [ -0.2 -0.26 -0.34 -0.53  -1.1    10  0.91  0.48]
 ]
 [
  [ 0.48  0.32  0.24   0.2  0.16  0.14  0.12  0.11]
  [ 0.91  0.48  0.32  0.24   0.2  0.16  0.14  0.12]
  [   10  0.91  0.48  0.32  0.24   0.2  0.16  0.14]
  [ -1.1    10  0.91  0.48  0.32  0.24   0.2  0.16]
  [-0.53  -1.1    10  0.91  0.48  0.32  0.24   0.2]
  [-0.34 -0.53  -1.1    10  0.91  0.48  0.32  0.24]
  [-0.26 -0.34 -0.53  -1.1    10  0.91  0.48  0.32]
 ]
 [
  [ 0.32  0.24   0.2  0.16  0.14  0.12  0.11 0.099]
  [ 0.48  0.32  0.24   0.2  0.16  0.14  0.12  0.11]
  [ 0.91  0.48  0.32  0.24   0.2  0.16  0.14  0.12]
  [   10  0.91  0.48  0.32  0.24   0.2  0.16  0.14]
  [ -1.1    10  0.91  0.48  0.32  0.24   0.2  0.16]
  [-0.53  -1.1    10  0.91  0.48  0.32  0.24   0.2]
  [-0.34 -0.53  -1.1    10  0.91  0.48  0.32  0.24]
 ]
 [
  [ 0.24   0.2  0.16  0.14  0.12  0.11 0.099  0.09]
  [ 0.32  0.24   0.2  0.16  0.14  0.12  0.11 0.099]
  [ 0.48  0.32  0.24   0.2  0.16  0.14  0.12  0.11]
  [ 0.91  0.48  0.32  0.24   0.2  0.16  0.14  0.12]
  [   10  0.91  0.48  0.32  0.24   0.2  0.16  0.14]
  [ -1.1    10  0.91  0.48  0.32  0.24   0.2  0.16]
  [-0.53  -1.1    10  0.91  0.48  0.32  0.24   0.2]
 ]
 [
  [  0.2  0.16  0.14  0.12  0.11 0.099  0.09 0.083]
  [ 0.24   0.2  0.16  0.14  0.12  0.11 0.099  0.09]
  [ 0.32  0.24   0.2  0.16  0.14  0.12  0.11 0.099]
  [ 0.48  0.32  0.24   0.2  0.16  0.14  0.12  0.11]
  [ 0.91  0.48  0.32  0.24   0.2  0.16  0.14  0.12]
  [   10  0.91  0.48  0.32  0.24   0.2  0.16  0.14]
  [ -1.1    10  0.91  0.48  0.32  0.24   0.2  0.16]
 ]
]

So for each vector, create a dummy dimension for the other vectors (implicit threading takes care of the sizes) and then get functional with your newly compatible multi-dimensional matrices.

And Robert is your mother's brother.

Writing the function that does this succinctly for N dimensions is left as an exercise for the reader.

Stay tuned for the exciting conclusion of k-Means Street, a blog post by Martin Scorsese staring Harvey Keitel and Robert De Niro.

4 comments

Tagged as:

PDL

4 Comments

Renormalist | March 24, 2020 1:17 PM | Reply

Finally I see more blog entries about PDL - thumbs up!

I need it basically once a year, read and fiddle around, then forget everything, then try to recall next year, etc., I will definitely come back here for (re-)finding examples and explanations.

Thanks!

Mohawk | December 21, 2021 4:29 PM | Reply

Superb!

Mohawk | November 4, 2024 3:26 PM | Reply

This would be super-easy - indeed, barely even an inconvenience - with the relatively-recently cribbed-from-MATLAB meshgrid.

$C = 1/meshgrid($x, $y)

Mohawk | November 4, 2024 4:49 PM | Reply

Whoops! That should be:

($xmesh, $ymesh) = meshgrid($x, $y);
$C = 1/($xmesh - $ymesh);

See https://metacpan.org/pod/PDL::Slices#meshgrid for more.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Enkidu

I am a Freelance Scientist** and Perl is my Igor.

More info »

enkidu