Viewed   80 times

Im currently working on a site which will contain a products catalog. I am a little new to database design so I'm looking for advice on how best to do this. I am familiar with relational database design so I understand "many to many" or "one to many" etc (took a good db class in college). Here is an example of what an item might be categorized as:

Propeller -> aircraft -> wood -> brand -> product.

Instead of trying to write what I have so far, just take a quick look at this image I created from the phpmyadmin designer feature.

alt text

Now, this all seemed fine and dandy, until I realized that the category "wood" would also be used under propeller -> airboat -> (wood). This would mean, that "wood" would have to be recreated every time I want to use it under a different parent. This isn't the end of the world, but I wanted to know if there is a more optimal way to go about this.

Also, I am trying to keep this thing as dynamic as possible so the client can organize his catalog as his needs change.

*Edit. Was thinking about just creating a "tags" table. So I could assign the tag "wood" or "metal" or "50inch" to 1 to many items. I would still keep a parenting type thing for the main categories, but this way the categories wouldnt have to go so deep and there wouldnt be the repetition.



First, the user interface: as user I hate to search a product in a catalog organized in a strictly hierarchical way. I never remember in what sub-sub-sub-sub...-category an "exotic" product is in and this force me to waste time exploring "promising" categories just to discover it is categorized in a (for me, at least) strange way.

What Kevin Peno suggests is a good advice and is known as faceted browsing. As Marcia Bates wrote in After the Dot-Bomb: Getting Web Information Retrieval Right This Time, " .. faceted classification is to hierarchical classification as relational databases are to hierarchical databases. .. ".

In essence, faceted search allows users to search your catalog starting from whatever "facet" they prefer and let them filter information choosing other facets along the search. Note that, contrary to how tag systems are usually conceived, nothing prevents you to organize some of these facets hierarchically.

To quickly understand what faceted search is all about, there are some demos to explore at The Flamenco Search Interface Project - Search Interfaces that Flow.

Second, the application logic: what Manitra proposes is also a good advice (as I understand it), i.e. separating nodes and links of a tree/graph in different relations. What he calls "ancestor table" (which is a much better intuitive name, however) is known as transitive closure of a directed acyclic graph (DAG) (reachability relation). Beyond performance, it simplify queries greatly, as Manitra said.

But I suggest a view for such "ancestor table" (transitive closure), so that updates are in real-time and incremental, not periodical by a batch job. There is SQL code (but I think it needs to be adapted a little to specific DBMSes) in papers I mentioned in my answer to query language for graph sets: data modeling question. In particular, look at Maintaining Transitive Closure of Graphs in SQL (.ps - postscript).

Products-Categories relationship

The first point of Manitra is worth of emphasis, also.

What he is saying is that between products and categories there is a many-to-many relationship. I.e.: each product can be in one or more categories and in each category there can be zero or more products.

Given relation variables (relvars) Products and Categories such relationship can be represented, for example, as a relvar PC with at least attributes P# and C#, i.e. product and category numbers (identifiers) in a foreign-key relationships with corresponding Products and Categories numbers.

This is complementary to management of categories' hierarchies. Of course, this is only a design sketch.

On faceted browsing in SQL

A useful concept to implement "faceted browsing" is relational division, or, even, relational comparisons (see bottom of linked page). I.e. dividing PC (Products-Categories) by a (growing) list of categories chosen from a user (facet navigation) one obtains only products in such categories (of course, categories are presumed not all mutually exclusive, otherwise choosing two categories one will obtain zero products).

SQL-based DBMS usually lack this operators (division and comparisons), so I give below some interesting papers that implement/discuss them:

  • A simpler (and better) SQL approach to relational division (.pdf from Journal of Information Systems Education - Contents Volume 13, Number 2 (2002));
  • Processing frequent itemset discovery queries by division and set containment join operators;
  • Laws for Rewriting Queries Containing Division Operators;
  • Algorithms and Applications for Universal Quantification in Relational Databases;
  • Optimizing Queries with Universal Quantification in Object-Oriented and Object-Relational Databases;
  • (ACM access required) On the complexity of division and set joins in the relational algebra;
  • (ACM access required) Fast algorithms for universal quantification in large databases;

and so on...

I will not go into details here but interaction between categories hierarchies and facet browsing needs special care.

A digression on "flatness"

I briefly looked at the article linked by Pras, Managing Hierarchical Data in MySQL, but I stopped reading after these few lines in the introduction:


Most users at one time or another have dealt with hierarchical data in a SQL database and no doubt learned that the management of hierarchical data is not what a relational database is intended for. The tables of a relational database are not hierarchical (like XML), but are simply a flat list. Hierarchical data has a parent-child relationship that is not naturally represented in a relational database table. ...

To understand why this insistence on flatness of relations is just nonsense, imagine a cube in a three dimensional Cartesian coordinate system: it will be identified by 8 coordinates (triplets), say P1(x1,y1,z1), P2(x2,y2,z2), ..., P8(x8, y8, z8) [here we are not concerned with constraints on these coordinates so that they represent really a cube].

Now, we will put these set of coordinates (points) into a relation variable and we will name this variable Points. We will represent the relation value of Points as a table below:

Points|  x |  y |  z |
       | x1 | y1 | z1 |
       | x2 | y2 | z2 |
       | .. | .. | .. |
       | .. | .. | .. |
       | x8 | y8 | z8 |

Does this cube is being "flattened" by the mere act of representing it in a tabular way? Is a relation (value) the same thing as its tabular representation?

A relation variable assumes as values sets of points in a n-dimensional discrete space, where n is the number of relation attributes ("columns"). What does it mean, for a n-dimensional discrete space, to be "flat"? Just nonsense, as I wrote above.

Don't get me wrong, It is certainly true that SQL is a badly designed language and that SQL-based DBMSes are full of idiosyncrasies and shortcomings (NULLs, redundancy, ...), especially the bad ones, the DBMS-as-dumb-store type (no referential constraints, no integrity constrains, ...). But that has nothing to do with relational data model fantasized limitations, on the contrary: more they turn away from it and worse is the outcome.

In particular, the relational data model, once you understand it, poses no problem in representing whatever structure, even hierarchies and graphs, as I detailed with references to published papers mentioned above. Even SQL can, if you gloss over its deficiencies, missing something better.

On the "The Nested Set Model"

I skimmed the rest of that article and I'm not particularly impressed by such logical design: it suggests to muddle two different entities, nodes and links, into one relation and this will probably cause awkwardness. But I'm not inclined to analyze that design more thoroughly, sorry.

EDIT: Stephan Eggermont objected, in comments below, that " The flat list model is a problem. It is an abstraction of the implementation that makes performance difficult to achieve. ... ".

Now, my point is, precisely, that:

  1. this "flat list model" is a fantasy: just because one lay out (represents) relations as tables ("flat lists") does not mean that relations are "flat lists" (an "object" and its representations are not the same thing);
  2. a logical representation (relation) and physical storage details (horizontal or vertical decompositions, compression, indexes (hashes, b+tree, r-tree, ...), clustering, partitioning, etc.) are distinct; one of the points of relational data model (RDM) is to decouple logical from "physical" model (with advantages to both users and implementors of DBMSes);
  3. performance is a direct consequence of physical storage details (implementation) and not of logical representation (Eggermont's comment is a classic example of logical-physical confusion).

RDM model does not constraint implementations in any way; one is free to implement tuples and relations as one see fit. Relations are not necessarily files and tuples are not necessarily records of a file. Such correspondence is a dumb direct-image implementation.

Unfortunately SQL-based DBMS implementations are, too often, dumb direct-image implementations and they suffer poor performance in a variety of scenarios - OLAP/ETL products exist to cover these shortcomings.

This is slowly changing. There are commercial and free software/open source implementations that finally avoid this fundamental pitfall:

  • Vertica, which is a commercial successor of..
  • C-Store: A Column-Oriented DBMS;
  • MonetDB;
  • LucidDB;
  • Kdb in a way;
  • an so on...

Of course, the point is not that there must exist an "optimal" physical storage design, but that whatever physical storage design can be abstracted away by a nice declarative language based on relational algebra/calculi (and SQL is a bad example) or more directly on a logic programming language (like Prolog, for example - see my answer to "prolog to SQL converter" question). A good DBMS should be change physical storage design on-the-fly, based on data access statistics (and/or user hints).

Finally, in Eggermont's comment the statement " The relational model is getting squeeezed between the cloud and prevayler. " is another nonsense but I cannot give a rebuttal here, this comment is already too long.

Saturday, November 19, 2022

To check if given model is related to another one, which is what you want if I get you right, all you need is this tiny method making the most of Eloquent:

(Implement it in BaseModel, Entity or a scope, whatever suits you)

// usage
$task->isRelatedTo('transactions.users', $id);
// or
$template->isRelatedTo('tasks.transactions.users', Auth::user());

// or any kind of relation:
// imagine this: User m-m Transaction 1-m Item m-1 Group
$group->isRelatedTo('items.transaction.users', $id);

The magic happens here:

 * Check if it is related to any given model through dot nested relations
 * @param  string  $relations
 * @param  int|IlluminateDatabaseEloquentModel  $id
 * @return boolean
public function isRelatedTo($relations, $id)
    $relations = explode('.', $relations);

    if ($id instanceof Model)
        $related = $id;
        $id = $related->getKey();
        $related = $this->getNestedRelated($relations);

    // recursive closure
    $callback = function ($q) use (&$callback, &$relations, $related, $id) 
        if (count($relations))
            $q->whereHas(array_shift($relations), $callback);
            $q->where($related->getQualifiedKeyName(), $id);

    return (bool) $this->whereHas(array_shift($relations), $callback)->find($this->getKey());

protected function getNestedRelated(array $relations)
    $models = [];

    foreach ($relations as $key => $relation)
        $parent = ($key) ? $models[$key-1] : $this;
        $models[] = $parent->{$relation}()->getRelated();

    return end($models);

Hey, but what's going on there?

isRelatedTo() works like this:

  1. check if passed $id is a model or just an id, and prepares $related model and its $id for use in the callback. If you don't pass an object then Eloquent needs to instantiate all the related models on the $relations (relation1.relation2.relation3...) chain to get the one we are interested in - that's what happens in getNestedRelated(), pretty straightforward.

  2. then we need to do something like this:

    // assuming relations 'relation1.relation2.relation3'
    $this->whereHas('relation1', function ($q) use ($id) {
       $q->whereHas('relation2', function ($q) use ($id) {
          $q->whereHas('relation3', function ($q) use ($id) {
             $q->where('id', $id);
    // returns new instance of current model or null, thus cast to (bool)
  3. since we don't know how deeply the relation is nested, we need to use recurrency. However we pass a Closure to the whereHas, so we need to use little trick in order to call itself inside its body (in fact we don't call it, but rather pass it as $callback to the whereHas method, since the latter expects a Closure as 2nd param) - this might be useful for those unfamiliar Anonymous recursive PHP functions:

    // save it to the variable and pass it by reference
    $callback = function () use (&$callback) {
      if (...) // call the $callback again
      else // finish;
  4. we also pass to the closure $relations (as an array now) by reference in order to unshift its elements, and when we got them all (meaning we nested whereHas), we finally put the where clause instead of another whereHas, to search for our $related model.

  5. finally let's return bool

Sunday, September 11, 2022

Try this query but if there was some data i would be able to check although i have checked with dummy data

  IFNULL(,'') as Game,
  IFNULL(,'') as Prize,
  case when != '' then 'Assigned' when != '' then 'Assigned' else 'Not assigned yet' end as `Status`
from token as t
  left join (select *
         from games
         where token_id not in(select
                   from prize)) as g
    on g.token_id = t.token_id
  left join (select *
         from prize
         where token_id not in(select
                   from games)) as p
    on p.token_id = t.token_id

Then it should be the most simple thing to do

select *
from `user`
  left join token
    on user.user_id = token.user_id
  left join games
    on games.token_id = token.token_id
  left join prize
    on prize.token_id = token.token_id
where user.user_id = 1
Sunday, November 13, 2022

You mention the most commonly implemented, which is Adjacency List:

There are other models as well, including materialized path and nested sets:

Joe Celko has written a book on this subject, which is a good reference from a general SQL perspective (it is mentioned in the nested set article link above).

Also, Itzik Ben-Gann has a good overview of the most common options in his book "Inside Microsoft SQL Server 2005: T-SQL Querying".

The main things to consider when choosing a model are:

1) Frequency of structure change - how frequently does the actual structure of the tree change. Some models provide better structure update characteristics. It is important to separate structure changes from other data changes however. For example, you may want to model a company's organizational chart. Some people will model this as an adjacency list, using the employee ID to link an employee to their supervisor. This is usually a sub-optimal approach. An approach that often works better is to model the org structure separate from employees themselves, and maintain the employee as an attribute of the structure. This way, when an employee leaves the company, the organizational structure itself does not need to be changes, just the association with the employee that left.

2) Is the tree write-heavy or read-heavy - some structures work very well when reading the structure, but incur additional overhead when writing to the structure.

3) What types of information do you need to obtain from the structure - some structures excel at providing certain kinds of information about the structure. Examples include finding a node and all its children, finding a node and all its parents, finding the count of child nodes meeting certain conditions, etc. You need to know what information will be needed from the structure to determine the structure that will best fit your needs.

Tuesday, October 11, 2022

If you do a structure like similar to this:

   - chatUID
       - members
           - userUID
       - lastMessageSent:messageUID
       - ... more properties  

   - chatUID
     - messageUID
         - sentBy: userUID
         - messageDate:""
         - messageTime:""
         - message:""

    - userUID
       - chatUID

you can attach a listener to /userChats/userUID, which will display active chats, and a listener to /chatMessages/chatUID, which will get all chat messages for a specific chat conversation.

This way is a lot easier to setup firebase security rules, and users will only receive chat messages which they are apart of.

Monday, September 19, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :