none
Modeling many-to-many relationships RRS feed

  • Question

  • One would normally model such a relationship between two table entities with a link table. If there are future extensions to this relationship, the necessary column(s) could then easily be added to the link table. The link table thus becoming a full-fledged table entity itself.

    What if a system is such that it requires a lot of many-to-many relationships? If most user interactions involve many joins across multiple tables, would there be anything to say against modeling many-to-many relationships with respect to performance (and other relevant attributes such as complexity, maintenance, etc.)?

    For example by choosing not to model a many-to-many relationship altogether (or any other kind of relationship for that matter) and instead having two 'unrelated' tables..is what i've seen in a live, legacy database.

    Kind regards..
    Thursday, July 30, 2009 6:51 AM

Answers

  • Many to many relationships are an indication of complex data. Still this shouldn’t be a reason to just ditch cross references tables in your database design. This is a bad idea!

    I’m used to creating an optimal design which potentially contains a lot of many to many relationships (5th normal form). From that point we optimize the database design for its usage scenarios continuously refactor the design while taking maintenance aspects into account. Also we document these optimizations in the design.


    http://www.paulgielens.com
    • Proposed as answer by Paul Gielens Thursday, July 30, 2009 7:40 AM
    • Marked as answer by darmir Thursday, July 30, 2009 7:03 PM
    Thursday, July 30, 2009 7:40 AM
  • Merging tables and compromising on normalization is very much a viable option for limited no. of scenarios in OLTP applications.
    Before doing it consider features like Data Partitions, Query tunining, Index creation and optimization, Indexed Views (Materialized views), summary tables, and so on.
    Datawarehousing applications usually have denormalized tables arranged in specialized manner called star schema or snowl flake schema.
    • Marked as answer by darmir Friday, July 31, 2009 8:09 AM
    Friday, July 31, 2009 7:44 AM
  • You have to be really careful that you have valid tests for performance - i.e. both under load and not under load, for a number of different queries, and relevant to your solution.
    Often a change you make to improve one area will degrade another, thus the overall improvement is zero.


    If you denormalise you might find that you have problems with indexes getting fragmented, or retrieval of data being incredibly cumbersome.  I would advise that you avoid denormalising the data unless you fully understand the implications.

    Many to many relationships come about in most solutions I have ever worked on, and is not necessarily an indication of complexity.  My experience is actually the opposite, denormalising to make something 'easier' often has the complete opposite effect in the long run.

    You need to be very clear as to how good is 'good enough'?  Is it realistic for your users, your application performance, your designs?  Will it cause time outs (if so, it needs looking at, and this generally is caused by denormalised data in a highly transactional system)  What is quick enough?  20 milliseconds?  Depends on what the query is - no point in performance tuning something that really doesn't matter.  Think of is as setting expectations for when the solution becomes very large and performance ins critical.

    Creating indexes, and views will only go so far, good design means that SQL can manage itself and be efficient.  There are definitely better performing ways to do things, such as using stored procedures are generally the better implementation, but it all depends on the structure and design of the database.

    A thing to consider, if the query to retrieve the data causes better performance, but inserting and deleting from this table which has multiple indexes causes the indexes to be fragmented, and thus slow the table, the performance improvement could actually be the cause of your performance concerns.  The difficulty is that once you implement the schema, it's hard to change, however, using indexes, stored procs and views on a nicely normalised database can be changed as required with a lot less of an impact.

    I hope this helps,

    Martin.
    MCSD, MCTS, MCPD. Please mark my post as helpful if you find the information good!
    • Marked as answer by darmir Tuesday, August 4, 2009 7:25 AM
    Monday, August 3, 2009 3:24 AM

All replies

  • Many to many relationships are an indication of complex data. Still this shouldn’t be a reason to just ditch cross references tables in your database design. This is a bad idea!

    I’m used to creating an optimal design which potentially contains a lot of many to many relationships (5th normal form). From that point we optimize the database design for its usage scenarios continuously refactor the design while taking maintenance aspects into account. Also we document these optimizations in the design.


    http://www.paulgielens.com
    • Proposed as answer by Paul Gielens Thursday, July 30, 2009 7:40 AM
    • Marked as answer by darmir Thursday, July 30, 2009 7:03 PM
    Thursday, July 30, 2009 7:40 AM
  • It is important to model many to many relationships using third entity (or table). This is required for future extensibility of the system as you have mentioned. It also eliminates redundancy (data duplication) in the system in turn contributing to improved performance and maintainability.

    Additionally performance can be taken further improved by planning appropriate indexes, creating views, etc.

    Lack of modeling this type of relationship distributes the ownership of data across database causing data ownership, data accuracy, data upgrade and migration issues in long run.

    Thursday, July 30, 2009 12:00 PM
  • Thank you for the replies!


    I agree with the fact that not modeling relationships is generally considered bad practice..or blasphemy for some ;)

    If during early performance testing some critical queries do not perform well, would merging tables and thereby compromising on normalization be a viable option?

    Are there any other options to consider in such a situation?
    Thursday, July 30, 2009 7:02 PM
  • Merging tables and compromising on normalization is very much a viable option for limited no. of scenarios in OLTP applications.
    Before doing it consider features like Data Partitions, Query tunining, Index creation and optimization, Indexed Views (Materialized views), summary tables, and so on.
    Datawarehousing applications usually have denormalized tables arranged in specialized manner called star schema or snowl flake schema.
    • Marked as answer by darmir Friday, July 31, 2009 8:09 AM
    Friday, July 31, 2009 7:44 AM
  • You have to be really careful that you have valid tests for performance - i.e. both under load and not under load, for a number of different queries, and relevant to your solution.
    Often a change you make to improve one area will degrade another, thus the overall improvement is zero.


    If you denormalise you might find that you have problems with indexes getting fragmented, or retrieval of data being incredibly cumbersome.  I would advise that you avoid denormalising the data unless you fully understand the implications.

    Many to many relationships come about in most solutions I have ever worked on, and is not necessarily an indication of complexity.  My experience is actually the opposite, denormalising to make something 'easier' often has the complete opposite effect in the long run.

    You need to be very clear as to how good is 'good enough'?  Is it realistic for your users, your application performance, your designs?  Will it cause time outs (if so, it needs looking at, and this generally is caused by denormalised data in a highly transactional system)  What is quick enough?  20 milliseconds?  Depends on what the query is - no point in performance tuning something that really doesn't matter.  Think of is as setting expectations for when the solution becomes very large and performance ins critical.

    Creating indexes, and views will only go so far, good design means that SQL can manage itself and be efficient.  There are definitely better performing ways to do things, such as using stored procedures are generally the better implementation, but it all depends on the structure and design of the database.

    A thing to consider, if the query to retrieve the data causes better performance, but inserting and deleting from this table which has multiple indexes causes the indexes to be fragmented, and thus slow the table, the performance improvement could actually be the cause of your performance concerns.  The difficulty is that once you implement the schema, it's hard to change, however, using indexes, stored procs and views on a nicely normalised database can be changed as required with a lot less of an impact.

    I hope this helps,

    Martin.
    MCSD, MCTS, MCPD. Please mark my post as helpful if you find the information good!
    • Marked as answer by darmir Tuesday, August 4, 2009 7:25 AM
    Monday, August 3, 2009 3:24 AM