Viewed   79 times

I'm am designing my database/domain for an eCommerce application and I'm having a hard time figuring out how to store products.

The website will sell a wide range of products, pens, thongs, tattoos, umbrellas, everything. Each of these product will share a few common attributes, height, width, length, weight, etc but some products have special data. For example, pens have different ink colors and tips/lids and brochures can have different types of folds. So far I have thought up some 20+ extra attributes, but these attributes may only apply to 1% of products on the website.

So I am wondering if it is appropriate to implement a EAV model to handle the extra data. Keeping in mind that when customers are viewing the site in the frontend, there will be a filtering sidebar like on eBay and (So keeping in mind there will be a fair bit of querying)

I don't think it's practical to implement Class Table inheritance as the system needs to remain flexible. This is because, down the track we may have more attributes in the future with new types of products.

The other thing I have considered is using a NoSQL database (probably MongoDB) however I have little experience with these types of databases, will it even solve my problem?

Review of options:

  1. Single products entity with lots of columns
  2. Separate attributes entity (EAV)
  3. Switch to schema-less persistence

I'm in the process of building a prototype with an attributes entity to see how flexible it is, and testing the performance and how out of control the querying gets.

EDIT: I am, of course, open to any other solutions.



Great question, but of course, there is no "one true way". As per @BenV, Magento does use the EAV model. My experience with it has been overwhelmingly positive, however it does trip up other users. Some considerations:

1. Performance. EAV requires complex, multi-table joins to populate your object with the relevant attributes. That does incur a performance hit. However, that can be mitigated through careful caching (at all levels through the stack, including query caching) and the selective use of denormalization. Magento does allow administrators to select a denormalized model for categories and products where the number of SKUs warrants it (generally in the thousands). That in turn requires Observers that trigger re-indexing (always good!) and updates to the "flat" denormalized tables when product data changes. That can also be scheduled or manually triggered with a prompt to the administrator.

2. 3rd Party User Complexity If you ever plan to make this application available to other users, many will find EAV too complex and you'll end up dealing with a lot of bleating and uninformed abuse on the user forums (ref Magento!!).

3. Future extensibility and plugin architecture. There is no doubt that the EAV model really comes into it's own when extensibility is a factor. It is very simple to add new attributes into the model while minimizing the risk of breaking existing ORM and controller code.

4. Changes in datatype EAV does make it a little harder to alter attribute datatypes. If your initial design calls for a particular attribute datatype that changes in future (say int to varchar), it means that you will have to migrate all the records for that attribute to the corresponding table that matches the new datatype. Of course, purists would suggest that you get the design right first time, but reality does intrude sometimes!

5. Manual product imports One thing that EAV makes almost impossible is importing products (or other entities) into the database using SQL and/or phpMyAdmin-style CSV/XML. You'll need to write an Importer module that accepts the structured data and passes it through the application's Model layer to persist it to the database. That does add to your complexity.

Sunday, December 18, 2022

There's a few general pros and cons I can think of, there are situations where one is better than the other:

Option 1, EAV Model:

  • Pro: less time to design and develop a simple application
  • Pro: new entities easy to add (might even be added by users?)
  • Pro: "generic" interface components
  • Con: complex code required to validate simple data types
  • Con: much more complex SQL for simple reports
  • Con: complex reports can become almost impossible
  • Con: poor performance for large data sets

Option 2, Modelling each entity separately:

  • Con: more time required to gather requirements and design
  • Con: new entities must be modelled and designed by a professional
  • Con: custom interface components for each entity
  • Pro: data type constraints and validation simple to implement
  • Pro: SQL is easy to write, easy to understand and debug
  • Pro: even the most complex reports are relatively simple
  • Pro: best performance for large data sets

Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)

  • Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
  • Con: new entities must be modelled and designed by a professional
  • Pro: new attributes might be easily added later on
  • Con: complex code required to validate simple data types (for the custom attributes)
  • Con: custom interface components still required, but generic interface components may be possible for the custom attributes
  • Con: SQL becomes complex as soon as any custom attribute is included in a report
  • Con: good performance generally, unless you start need to search by or report by the custom attributes

* I'm not sure if Option 3 would necessarily save any time in the design phase.

Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.

Tuesday, December 20, 2022

Here you go. An illustrative story:

Tuesday, November 1, 2022

What you need is a combination of EAV for product features and nested sets for product categories.

While I certainly agree that EAV is almost always a bad choice, one application where EAV is the perfect choice is for handling product attributes in an online catalog.

Think about how websites show product attributes... The attributes of products are always shown as a vertical list with two columns: "Attribute" | "Value". Sometimes these lists show side-by-side comparisons of multiple products. EAV works perfectly for doing this kind of thing. The things that make EAV meaningless and inefficient for most applications are exactly what makes EAV meaningful and efficient for product attributes in an online catalog.

One of the reasons why everyone always says "EAV is EVIL!" is that the attributes in EAV are "meaningless" insofar as the column name (i.e. meaning of the attribute) is table-driven and is therefore not defined by the schema. The whole point of schemas is to give your model meaning so this point is well taken. However in the case of an online product catalog, the meaning of product attributes is really unimportant to the system, itself. The only reason your catalog system cares about product attributes is to dump them in a list or possibly in a product comparison matrix. Therefore EAV is doesn't happen to be evil in this particular case.

For product categories, you want a nested set model, as I described in the answer to this question. Nested sets give you very quick retrieval along with the ability to traverse multiple levels of an unbalanced hierarchy at the expense of some precalculation effort at edit time.

Wednesday, October 12, 2022

There will be a limited amount of hair colors, so the preference table needs to have a hair_color_id and be set up like so:

class User < ActiveRecord::Base
  has_one :hair_color
  has_many :preferences

class HairColor < ActiveRecord::Base 
  belongs_to :user
  belongs_to :preference

class Preference < ActiveRecord::Base 
  belongs_to :user
  has_many :hair_color

I believe that's correct. Let me know if you run into any snags.

When you add eye color or any other characteristic, you'll probably have to do something different with preference. I'd have 4 columns at that point: id, user_id, foreign_id, foreign_type

foreign_id would be the id from the eye_color/hair_color table, and foreign_type would be "eye" or "hair" or something. Then in your model, you'd have something like this:

class HairColor < ActiveRecord::Base 
  belongs_to :user
  has_many :preferences, :foreign_key => :foreign_id, :conditions => { "preferences.foreign_type" => "hair" }

That gets a little crazy, but it's the most DRY way of doing it. You'd put the same thing in your eye_color.rb and just replace "hair" with "eye" for the foreign_type.

Wednesday, December 21, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :