Viewed   56 times

I have just made the update/add/delete part for the "Closure table" way of organizing query hierarchical data that are shown on page 70 in this slideshare: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back

My database looks like this:

Table Categories:

ID         Name
1          Top value
2          Sub value1

Table CategoryTree:

child     parent     level
1          1         0
2          2         0  
2          1         1  

However, I have a bit of an issue getting the full tree back as an multidimensional array from a single query.

Here's what I would like to get back:

 array (

 'topvalue' = array (
                     'Subvalue',
                     'Subvalue2',
                     'Subvalue3)
                     );

 );

Update: Found this link, but I still have a hard time to convert it into an array: http://karwin.blogspot.com/2010/03/rendering-trees-with-closure-tables.html

Update2 : I was able to add depths to each of the categories now, if that can be of any help.

 Answers

2

Okay, I've written PHP classes that extend the Zend Framework DB table, row, and rowset classes. I've been developing this anyway because I'm speaking at PHP Tek-X in a couple of weeks about hierarchical data models.

I don't want to post all my code to because they implicitly get licensed under Creative Commons if I do that. update: I committed my code to the Zend Framework extras incubator and my presentation is Models for Hierarchical Data with SQL and PHP at slideshare.

I'll describe the solution in pseudocode. I'm using zoological taxonomy as test data, downloaded from ITIS.gov. The table is longnames:

CREATE TABLE `longnames` (
  `tsn` int(11) NOT NULL,
  `completename` varchar(164) NOT NULL,
  PRIMARY KEY (`tsn`),
  KEY `tsn` (`tsn`,`completename`)
)

I've created a closure table for the paths in the hierarchy of taxonomy:

CREATE TABLE `closure` (
  `a` int(11) NOT NULL DEFAULT '0',  -- ancestor
  `d` int(11) NOT NULL DEFAULT '0',  -- descendant
  `l` tinyint(3) unsigned NOT NULL,  -- levels between a and d
  PRIMARY KEY (`a`,`d`),
  CONSTRAINT `closure_ibfk_1` FOREIGN KEY (`a`) REFERENCES `longnames` (`tsn`),
  CONSTRAINT `closure_ibfk_2` FOREIGN KEY (`d`) REFERENCES `longnames` (`tsn`)
)

Given the primary key of one node, you can get all its descendants this way:

SELECT d.*, p.a AS `_parent`
FROM longnames AS a
JOIN closure AS c ON (c.a = a.tsn)
JOIN longnames AS d ON (c.d = d.tsn)
LEFT OUTER JOIN closure AS p ON (p.d = d.tsn AND p.l = 1)
WHERE a.tsn = ? AND c.l <= ?
ORDER BY c.l;

The join to closure AS p is to include each node's parent id.

The query makes pretty good use of indexes:

+----+-------------+-------+--------+---------------+---------+---------+----------+------+-----------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref      | rows | Extra                       |
+----+-------------+-------+--------+---------------+---------+---------+----------+------+-----------------------------+
|  1 | SIMPLE      | a     | const  | PRIMARY,tsn   | PRIMARY | 4       | const    |    1 | Using index; Using filesort |
|  1 | SIMPLE      | c     | ref    | PRIMARY,d     | PRIMARY | 4       | const    | 5346 | Using where                 |
|  1 | SIMPLE      | d     | eq_ref | PRIMARY,tsn   | PRIMARY | 4       | itis.c.d |    1 |                             |
|  1 | SIMPLE      | p     | ref    | d             | d       | 4       | itis.c.d |    3 |                             |
+----+-------------+-------+--------+---------------+---------+---------+----------+------+-----------------------------+

And given that I have 490,032 rows in longnames and 4,299,883 rows in closure, it runs in pretty good time:

+--------------------+----------+
| Status             | Duration |
+--------------------+----------+
| starting           | 0.000257 |
| Opening tables     | 0.000028 |
| System lock        | 0.000009 |
| Table lock         | 0.000013 |
| init               | 0.000048 |
| optimizing         | 0.000032 |
| statistics         | 0.000142 |
| preparing          | 0.000048 |
| executing          | 0.000008 |
| Sorting result     | 0.034102 |
| Sending data       | 0.001300 |
| end                | 0.000018 |
| query end          | 0.000005 |
| freeing items      | 0.012191 |
| logging slow query | 0.000008 |
| cleaning up        | 0.000007 |
+--------------------+----------+

Now I post-process the result of the SQL query above, sorting the rows into subsets according to the hierarchy (pseudocode):

while ($rowData = fetch()) {
  $row = new RowObject($rowData);
  $nodes[$row["tsn"]] = $row;
  if (array_key_exists($row["_parent"], $nodes)) {
    $nodes[$row["_parent"]]->addChildRow($row);
  } else {
    $top = $row;
  }
}
return $top;

I also define classes for Rows and Rowsets. A Rowset is basically an array of rows. A Row contains an associative array of row data, and also contains a Rowset for its children. The children Rowset for a leaf node is empty.

Rows and Rowsets also define methods called toArrayDeep() which dump their data content recursively as a plain array.

Then I can use the whole system together like this:

// Get an instance of the taxonomy table data gateway 
$tax = new Taxonomy();

// query tree starting at Rodentia (id 180130), to a depth of 2
$tree = $tax->fetchTree(180130, 2);

// dump out the array
var_export($tree->toArrayDeep());

The output is as follows:

array (
  'tsn' => '180130',
  'completename' => 'Rodentia',
  '_parent' => '179925',
  '_children' => 
  array (
    0 => 
    array (
      'tsn' => '584569',
      'completename' => 'Hystricognatha',
      '_parent' => '180130',
      '_children' => 
      array (
        0 => 
        array (
          'tsn' => '552299',
          'completename' => 'Hystricognathi',
          '_parent' => '584569',
        ),
      ),
    ),
    1 => 
    array (
      'tsn' => '180134',
      'completename' => 'Sciuromorpha',
      '_parent' => '180130',
      '_children' => 
      array (
        0 => 
        array (
          'tsn' => '180210',
          'completename' => 'Castoridae',
          '_parent' => '180134',
        ),
        1 => 
        array (
          'tsn' => '180135',
          'completename' => 'Sciuridae',
          '_parent' => '180134',
        ),
        2 => 
        array (
          'tsn' => '180131',
          'completename' => 'Aplodontiidae',
          '_parent' => '180134',
        ),
      ),
    ),
    2 => 
    array (
      'tsn' => '573166',
      'completename' => 'Anomaluromorpha',
      '_parent' => '180130',
      '_children' => 
      array (
        0 => 
        array (
          'tsn' => '573168',
          'completename' => 'Anomaluridae',
          '_parent' => '573166',
        ),
        1 => 
        array (
          'tsn' => '573169',
          'completename' => 'Pedetidae',
          '_parent' => '573166',
        ),
      ),
    ),
    3 => 
    array (
      'tsn' => '180273',
      'completename' => 'Myomorpha',
      '_parent' => '180130',
      '_children' => 
      array (
        0 => 
        array (
          'tsn' => '180399',
          'completename' => 'Dipodidae',
          '_parent' => '180273',
        ),
        1 => 
        array (
          'tsn' => '180360',
          'completename' => 'Muridae',
          '_parent' => '180273',
        ),
        2 => 
        array (
          'tsn' => '180231',
          'completename' => 'Heteromyidae',
          '_parent' => '180273',
        ),
        3 => 
        array (
          'tsn' => '180213',
          'completename' => 'Geomyidae',
          '_parent' => '180273',
        ),
        4 => 
        array (
          'tsn' => '584940',
          'completename' => 'Myoxidae',
          '_parent' => '180273',
        ),
      ),
    ),
    4 => 
    array (
      'tsn' => '573167',
      'completename' => 'Sciuravida',
      '_parent' => '180130',
      '_children' => 
      array (
        0 => 
        array (
          'tsn' => '573170',
          'completename' => 'Ctenodactylidae',
          '_parent' => '573167',
        ),
      ),
    ),
  ),
)

Re your comment about calculating depth -- or really length of each path.

Assuming you've just inserted a new node to your table that holds the actual nodes (longnames in the example above), the id of the new node is returned by LAST_INSERT_ID() in MySQL or else you can get it somehow.

INSERT INTO Closure (a, d, l)
  SELECT a, LAST_INSERT_ID(), l+1 FROM Closure
  WHERE d = 5 -- the intended parent of your new node 
  UNION ALL SELECT LAST_INSERT_ID(), LAST_INSERT_ID(), 0;
Sunday, September 11, 2022
5

In model

$res = $this->db->get(); 
return $res->result_array();

then apply

foreach($answer as $a)
{
  var_dump( $a );
} 

use $a to get values

Sunday, August 14, 2022
 
2

You can use a Nested Set Model as it yields very efficient queries. Check out Managing Hierarchical Data in MySQL and read the section called Nested Set Model.

If you're using an ORM like Doctrine, it includes nested set capabilities.

It can be difficult for some to grasp the nested set concepts of left and right. I have found that using those numbers as an analogy for the line numbers of open/close tags in an XML document, folks find it easier to grasp.

For instance, take the data example from the MySQL link above:

+-------------+----------------------+-----+-----+
| category_id | name                 | lft | rgt |
+-------------+----------------------+-----+-----+
|           1 | ELECTRONICS          |   1 |  20 |
|           2 | TELEVISIONS          |   2 |   9 |
|           3 | TUBE                 |   3 |   4 |
|           4 | LCD                  |   5 |   6 |
|           5 | PLASMA               |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |  10 |  19 |
|           7 | MP3 PLAYERS          |  11 |  14 |
|           8 | FLASH                |  12 |  13 |
|           9 | CD PLAYERS           |  15 |  16 |
|          10 | 2 WAY RADIOS         |  17 |  18 |
+-------------+----------------------+-----+-----+

If you take the lft, rgt fields and use them as line numbers for an XML document, you get:

1. <electronics>
2.    <televisions>
3.        <tube>
4.        </tube>
5.        <lcd>
6.        </lcd>
7.        <plasma>  
8.        </plasma> 
9.     </televisions>
10.    <portable electronics>
11.        <mp3 players>
12.            <flash>
13.            </flash>
14.        </mp3 players>
15.        <cd players>
16.        </cd players>
17.        <2 way radios>
18.        </2 way radios>
19.    </portable electronics>
20. </electronics>

Seeing it this way can make it much easier for some to visualize the resulting nested set hierarchy. It also makes it clearer why this approach improves efficiency as it makes it possible to select entire nodes without the need for multiple queries or joins.

Monday, November 14, 2022
 
3

Personally I prefer the first option, that is, separate tables for Sales and Receiving.

The two biggest disadvantage in option number 2 or merging two tables into one are:

1) Inflexibility
2) Unnecessary filtering when use

First on inflexibility. If your requirements expanded (or you just simply overlooked it) then you will have to break up your schema or you will end up with unnormalized tables. For example let's say your sales would now include the Sales Clerk/Person that did the sales transaction so obviously it has nothing to do with 'Receiving'. And what if you do Retail or Wholesale sales how would you accommodate that in your merged tables? How about discounts or promos? Now, I am identifying the obvious here. Now, let's go to Receiving. What if we want to tie up our receiving to our Purchase Order? Obviously, purchase order details like P.O. Number, P.O. Date, Supplier Name etc would not be under Sales but obviously related more to Receiving.

Secondly, on unnecessary filtering when use. If you have merged tables and you want only to use the Sales (or Receving) portion of the table then you have to filter out the Receiving portion either by your back-end or your front-end program. Whereas if it a separate table you have just to deal with one table at a time.

Additionally, you mentioned ORM, the first option would best fit to that endeavour because obviously an object or entity for that matter should be distinct from other entity/object.

Wednesday, December 7, 2022
 
5

This sort of this is probably better suited for a graph style of data store. Something akin to how facebook keeps hierarchies of relationships.

If you are bound and determined to use MySQL you could probably get away with your schema by using a recursive search. Since your tree can be of variable depth you could start self-joining at given location and 'walk' down a branch recursing until you didn't find anymore descendants. Return that branch and start down the next one. Similar process for traversing up to find parents.

Thursday, August 25, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :