Convoluted solutions

Storing tree-like data structures (such as webpages, categories, blog comments, forum topics, nested menus, organization charts) in relational databases is a real challenge and even when you’re using ready-made optimized libraries it’s still convoluted.

  • adjacency list is recursive and don’t scale well;
  • materialized path uses unanchored text-comparison (like '%this%') — better scaling but no cigar;
  • nested sets (aka modified preorder tree traversal) are heavy maintenance: easy for data-reads one select query can fetch a lot of what you may need, but some update scenarios can touch half or more of the records in that table;
  • closure trees duplicate lots of record-to-record relationships and updates can remove/reinsert lots of record-to-record relationships;

All of the above patterns usually also use adjacency list in parallel to allow for consistency checks — in case the application-level implementation is bad or transactions aren’t used (or available) the trees can end up in incoherent states which need to be corrected (with more fallible code).

Would you try saying all of that in one single breath? It all seems too damn convoluted to work with.

Hats of to some

Oracle has some (syntactic) support for selecting hierarchical data from adjacency lists. It’s still searching for data recursively — all other patterns try to avoid this — but at least it’s being done without round-trip TCP communication from the application to the database and with possible query planner optimizations included.

But unfortunately there is not indexing support included with this Oracle feature which is where the support should be on this topic.

PostgreSQL offers syntactic support for querying materialized path columns as well as a custom data type for them with support for indexing said data type (using GiST).

Can you spell dysfunctional?

We need the ease of use of adjacency lists — only specify a parent — with the syntactical support Oracle offers them and the data type and indexing support PostgreSQL is offering.

To say this in one breath: CREATE INDEX category_parent ON category(parent) USING HIERARCHICAL, why doesn’t this just work after 40 years of SQL evolution?

Advertisements