Depth-first nodetree

By kaare on May 14, 2010 12:05 PM

Ovid posted an interesting question about how to sort threads in a forum so you'll get the posts that belong to each thread before the next thread starts. Perl part here.

The question is,

Can I do that in straight SQL (this is PostgreSQL 8.4) or should additional information be added to this table?

As a matter of fact, in PostgreSQL 8.4 you can do it in straight SQL with no further information necessary. I will show how.

To set up the example we'll need a table:

CREATE TABLE nodes (
	id int PRIMARY KEY,
	parent int REFERENCES nodes(id)
);

and some data:

INSERT INTO nodes VALUES (1,null),(2,1),(3,1),(4,1),(5,1),
    (6,2),(7,1),(8,6),(9,5);

In order to get a meaningful tree from this, we will use a recursive Common Table Expression (CTE). The concept of CTE's is a SQL ISO standard, but it's new to PostgreSQL in version 8.4.

A recursive CTE can be thought about as a dynamic self-union, figuring out what to union with in the process.

It consists of an anchor that gives a starting point, and the recursive part.

The full query is here:

WITH RECURSIVE nodetree(level, id, pid, sort) AS (
  SELECT 1, id, parent, '{1}'::int[] FROM nodes WHERE parent IS NULL
    UNION
  SELECT level+1,p.id, parent, sort||p.id FROM nodetree pr
  JOIN nodes p ON p.parent = pr.id
)
SELECT * FROM nodetree ORDER BY sort;

The first select defines the starting point; we find the root node and initiates the level and sort values.

The next select calculates the level and pushes the id onto the sort array. You could use a string instead of an array, but PostgreSQL will do The Right Thing when ordering by an array.
Note that '||' is the concatenation operator for arrays (as well as for strings) in PostgreSQL.

Only left now is getting the result and sorting it. You get a level for free, and if you want to cut off the tree at a given level, you can do it in the recursive select.

4 comments

Tagged as:

perl, postgresql

4 Comments

Christian Brink | May 14, 2010 2:08 PM | Reply

Great find.

One note, that should be a UNION ALL.

Ovid | May 16, 2010 12:43 PM | Reply

Thanks for sharing this. Any knowledge of the performance impact on this?

I am curious to look into this more. There are two reasons why SQL can't replace a logic programming language: recursion and lists. PostgreSQL natively provides both. Unfortunately, the list implementation seems rather primitive. I'd actually want to be able to store proper relations in the database than a list. The object-relational impedance mismatch largely goes away by allowing attributes to store relations (though I don't know of any ORMs which try and target this).

kaare replied to comment from Ovid | May 16, 2010 7:25 PM | Reply

No, I don't have any hard numbers, but I would expect the recursive solution to perform better than any solution where you have to use an external language.

kaare replied to comment from Ovid | May 17, 2010 5:21 PM | Reply

After I had a look on the other solutions, I can see that my reply is, well, inaccurate.

Theoretically, having PostgreSQL find the data sets dynamically might give a slightly longer response time.
That's theoretical. I really doubt you will be able to measure any difference in any sane case.

The advantage of the recursive cte solution is that it doesn't rely on any trigger or redundant information. Just an id and parent.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About kaare

I blog about Perl and stuff.

More info »

kaare