Depth-first nodetree

Ovid posted an interesting question about how to sort threads in a forum so you'll get the posts that belong to each thread before the next thread starts. Perl part here.

The question is,

Can I do that in straight SQL (this is PostgreSQL 8.4) or should additional information be added to this table?
As a matter of fact, in PostgreSQL 8.4 you can do it in straight SQL with no further information necessary. I will show how.

To set up the example we'll need a table:

CREATE TABLE nodes (
	id int PRIMARY KEY,
	parent int REFERENCES nodes(id)
);

and some data:

INSERT INTO nodes VALUES (1,null),(2,1),(3,1),(4,1),(5,1),
    (6,2),(7,1),(8,6),(9,5);


In order to get a meaningful tree from this, we will use a recursive Common Table Expression (CTE). The concept of CTE's is a SQL ISO standard, but it's new to PostgreSQL in version 8.4.

A recursive CTE can be thought about as a dynamic self-union, figuring out what to union with in the process.

It consists of an anchor that gives a starting point, and the recursive part.

The full query is here:

WITH RECURSIVE nodetree(level, id, pid, sort) AS (
  SELECT 1, id, parent, '{1}'::int[] FROM nodes WHERE parent IS NULL
    UNION
  SELECT level+1,p.id, parent, sort||p.id FROM nodetree pr
  JOIN nodes p ON p.parent = pr.id
)
SELECT * FROM nodetree ORDER BY sort;

The first select defines the starting point; we find the root node and initiates the level and sort values.

The next select calculates the level and pushes the id onto the sort array. You could use a string instead of an array, but PostgreSQL will do The Right Thing when ordering by an array.
Note that '||' is the concatenation operator for arrays (as well as for strings) in PostgreSQL.

Only left now is getting the result and sorting it. You get a level for free, and if you want to cut off the tree at a given level, you can do it in the recursive select.

4 Comments

Great find.

One note, that should be a UNION ALL.

Thanks for sharing this. Any knowledge of the performance impact on this?

I am curious to look into this more. There are two reasons why SQL can't replace a logic programming language: recursion and lists. PostgreSQL natively provides both. Unfortunately, the list implementation seems rather primitive. I'd actually want to be able to store proper relations in the database than a list. The object-relational impedance mismatch largely goes away by allowing attributes to store relations (though I don't know of any ORMs which try and target this).

Leave a comment

About kaare

user-pic I blog about Perl and stuff.