none
Is Denormalization Best Practices?

All replies

  • Hello Kalman,

    I think the grade of normalization depends on the szenario.

    In an non-ERP OLTP I would always prefer to normalize the database design to a high grade, it increases the write perfomance an decreases the database size.

    A ERP OLTP database depends also on some "law/tax-related" afforts, you have to e.g. save the exact address and other tax-important information on the actually point of saving/processing the order/invoice. In a OLAP this would be a SCD = Slowy changing dimension, in an OLTP I would denormalize it and saving the detail data with every dataset.

    In a dataware house the best practice is to denormalize the data to increase the performanze of ad-hoc reporting; with the prize of a higher data volume.


    Olaf Helper ----------- * cogito ergo sum * errare humanum est * quote erat demonstrandum * Wenn ich denke, ist das ein Fehler und das beweise ich täglich http://olafhelper.over-blog.de
    Saturday, October 23, 2010 6:44 AM
  • the best practice is to denormalize the data to increase the performanze of ad-hoc reporting; with the prize of a higher data volume.

    Certainly Olaf. Data warehouse database contains static, second-hand data.  Flat dimension is the expression used for denormalized dimensions (to keep the number of JOINs down for performance reason).
    Kalman Toth, SQL Server & Business Intelligence Training; Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016
    Saturday, October 23, 2010 7:10 AM
    Moderator
  • A ERP OLTP database depends also on some "law/tax-related" afforts, you have to e.g. save the exact address and other tax-important information on the actually point of saving/processing the order/invoice.

    Olaf,

    That is a good point. In that kind of scenario you don't have choice as a database designer.

    Other: You can get to a "denormalized" (unnormalized) database by not having a database designer on the project and less-than-senior SQL developers "throwing" it together in the middle of programming - "oops we need a new table again, ready in 5 minutes with table designer".

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Saturday, October 23, 2010 1:56 PM
    Moderator
  • A ERP OLTP database depends also on some "law/tax-related" afforts, you have to e.g. save the exact address and other tax-important information on the actually point of saving/processing the order/invoice.

    Olaf,

    That is a good point. In that kind of scenario you don't have choice as a database designer.


    Storing address and tax info for each invoice has nothing to do with denormalization. It's hard to imagine that any legislation would ever demand denormalization (a law requiring normalized database design might be a good idea I suppose - although it would be very unpopular! :)

    Denormalization has nothing to do with performance either. Normal Forms describe the logical state of the database but say nothing about storage structures or physical operations used in the database. It is the storage and physical aspects that determine performance. This is an important distinction.

     

    Saturday, October 23, 2010 2:23 PM
  • Denormalization has nothing to do with performance either.

    Thanks David,

    Are you sure about that? I have seen  performance (by keeping the JOINs down, precalculate aggregates) given as a justification for denormalization.

    Related link:

    Denormalize for Performance

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Saturday, October 23, 2010 2:35 PM
    Moderator
  • Denormalization has nothing to do with performance either.

    Thanks David,

    Are you sure about that? I have seen  performance (by keeping the JOINs down, precalculate aggregates) given as a justification for denormalization.

     


     


    Yes I 'm sure. Normalization tells us what the database looks like logically. It doesn't tell us what physical structures (files, indexes, dataspaces, storage, memory structures) are used to implement the database or what physical operations will be performed on it. Performance is always a function of those physical structures and operations. Any performance optimisations can therefore only be made by changing those physical aspects.

    Of course it's also true that DBMSs only give us a limited number of ways to implement any given logical design and that limits on physical database independence mean that compromising the logical design may sometimes be the only way to achieve certain physical changes. In all cases however, it's the physical implementation and the database workload that is the key to performance, not denormalization.

    Tuesday, October 26, 2010 9:06 PM
  • There is a prevailing perception that OLTP denormalization (avoiding JOINs) speeds up operations which is probably true frequently. The downside of denormalization is true also not frequently, always.

    Related link:

    Frequently, however, performance needs dictate very quick retrieval capability for data stored in relational databases. To accomplish this, sometimes the decision is made to denormalize the physical implementation. Denormalization is the process of putting one fact in numerous places. This speeds data retrieval at the expense of data modification.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Tuesday, October 26, 2010 9:33 PM
    Moderator
  • Related link:

    Frequently, however, performance needs dictate very quick retrieval capability for data stored in relational databases. To accomplish this, sometimes the decision is made to denormalize the physical implementation. Denormalization is the process of putting one fact in numerous places. This speeds data retrieval at the expense of data modification.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM


    Hmm. From that article: "it is possible that, given a changing hardware and software environment, denormalized tables may be causing performance degradation instead of performance gains". So it is the hardware and software environment that determines performance whereas denormalization is orthogonal to performance. What I said before.

    Interestingly the same article also says "Only one valid reason exists for denormalizing a relational design - to enhance performance". This is doubly untrue because there is at least one other reason to denormalize. In most DBMSs (SQL-based ones) denormalization allows you to implement certain constraints that could not otherwise be implemented. That's because multi-table constraints are not widely or well supported. Sometimes the only way to implement a business rule is to ensure you put the attributes concerned into the same table rather than separate tables.

    Tuesday, October 26, 2010 9:46 PM
  • Serendipitously enough, I'm having a similar discussion with David on normalization on another forum ... I pop in here, and who I do see?  :)

    I have to strenuously, strenuously disagree with SQLUSA's statement that "performance is never a reason to denormalize".   For anyone working in the real world - rather than ivory-tower academics - what matters is results.  Period.  Not elegance.

    I once saved a firm from purchasing over $1M in new servers, simply by denormalizing one column into two additional tables, thereby allowing a real-time performance constraint to be met with the existing hardware.    Do you think they cared that I introduced the need for a cascading update on a rarely-updated bit of data?

    And while I have seen the opposite case - where fully normalizing a schema did boost performance, thanks to the narrower table/index results - that's a bit of a red herring.  Such results are almost always the result of careless design by tyros.   Carefully crafted denormalization done for performance reasons will almost always yield results.  


    Michael Asher
    Tuesday, October 26, 2010 10:01 PM
  • David,

    Valid point about multi-table constraints. How do you handle, as a database designer, when developers begging for relief from 10+ JOINs queries?


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Tuesday, October 26, 2010 10:04 PM
    Moderator
  • I have to strenuously, strenuously disagree with SQLUSA's statement that "performance is never a reason to denormalize".   For anyone working in the real world - rather than ivory-tower academics - what matters is results.  Period.  Not elegance.

    I once saved a firm from purchasing over $1M in new servers,

    Michael,

    Actually I agree with you and disagree with myself on that... If you have to do a job under pressure of all different kinds, anything goes. Everybody wants success, not pretty code or design.

    However, once you put out the fire, and your management gives you green light, it is time to pay attention to prettyness...

     BTW: Wonder if the HW vendor was happy with you?


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Tuesday, October 26, 2010 10:13 PM
    Moderator
  • However, once you put out the fire, and your management gives you green light, it is time to pay attention to prettyness...

    That I can agree with.  I would never call denormalization per se best practice.   The rule should always be- normalize fully, then denormalize only to meet performance constraints.  

    While I'm a huge fan of targeted denormalization, one of its nastier little gotchas is the hidden update cost.  Denormalize in a vacuum to optimize a particular query, and you might find performance suffering even more from the concurrency issues from that stream of cascading updates.

     


    Michael Asher
    Tuesday, October 26, 2010 10:20 PM
  • I once saved a firm from purchasing over $1M in new servers, simply by denormalizing one column into two additional tables, thereby allowing a real-time performance constraint to be met with the existing hardware.    Do you think they cared that I introduced the need for a cascading update on a rarely-updated bit of data?


    Michael Asher

    It sounds like what you really did was to change the internal storage such that the desired data was kept together in contiguous storage. Depending on what options the DBMS had, you might have been able to achieve that either with or without denormalization. It might also have been the case that denormalization would have made no difference at all (for example in a column store where different columns in the same table are not in contiguous storage).

    Saying that you denormalized it doesn't tell us what you actually did to improve the performance. It's a bit like saying "I painted this car red and then it went faster" without metioning that when you painted it red you also happened to install a better engine as well!

     

     

    Tuesday, October 26, 2010 10:23 PM
  • It sounds like what you really did was to change the internal storage such that the desired data was kept together in contiguous storage... Saying that you denormalized it doesn't tell us what you actually did to improve the performance. 

    Actually, I stated specifically what was done to improve performance.  I replicated a column into two new tables, one of which was a six- or seven- fold deep child of the parent table containing the original column.  That eliminated a rather complex series of joins.  Performance on one of the two queries in question improved by a factor of several hundred percent.  It wasn't a matter of accidentally aligning data into contiguous storage.   

    That's not a vanishingly rare corner case, either.  I've been denormalizing data for performance reasons for 25 years.  Modern RDBMSs have optimized joins to a near-miraculous degree, so denormalization isn't as necessary as it once was.  But that join cost still exists, and in certain scenarios it can be overwhelming, and the benefits of targeted denormalization more than outweigh the disadvantages.


    Michael Asher
    Tuesday, October 26, 2010 10:54 PM
  • Actually, I stated specifically what was done to improve performance.  I replicated a column into two new tables, one of which was a six- or seven- fold deep child of the parent table containing the original column.  That eliminated a rather complex series of joins.  Performance on one of the two queries in question improved by a factor of several hundred percent.  It wasn't a matter of accidentally aligning data into contiguous storage.   
    Michael Asher


    Apparently you didn't get the point of my previous reply because you are still telling us what the colour of the car was and not what was happening under the hood.

    By the way my point was not that you "accidentally" changed anything but that you very deliberately intended to make some physical change. If you had not made a change to physical storage then you probably would not have seen any performance difference. Unfortunately you still haven't told us what the change was. We can probably guess but we wouldn't necessarily be right and in any case the answer wouldn't be related to denormalization because denormalization tells us nothing about how the data is stored.

     

    Tuesday, October 26, 2010 11:12 PM
  • David,

    Valid point about multi-table constraints. How do you handle, as a database designer, when developers begging for relief from 10+ JOINs queries?


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM

    That's an easy one :-)  developers should never see, have access to, or care about your table structures. All they ever have access to are views and Stored Procedures written by a SQL professional.
    Dave Frommer - BI Architect Slalom Consulting (www.slalom.com)
    Wednesday, October 27, 2010 1:59 AM
  • Apparently you didn't get the point of my previous reply because you are still telling us what the colour of the car was and not what was happening under the hood.

    David, I think you may be unfamiliar with why denormalization is done.   I'll give a simple example:

    • A company has clients
    • Each client has jobs
    • Each job has phases
    • Each phase has tasks
    • Each task has workers

    In a normalized schema, if you want a list of workers and their company name, you must create the join chain worker->task->phase->job->client->company.   That takes work.   Now, physically, an RDBMS can implement that chain in many different ways.  Regardless of the underlying methodology, however, it still has to do the work somehow.  Getting hung up on physical implementation details can mask that fact.   It doesn't matter "how the data is stored", as you say.  The joins still must be done.

    A targeted denormalization would move the company name column to the workers table.   In a typical physical implementation, that would save 5 index seeks per row, plus some bookkeeping.  But no matter how its implemented, you're still giving the engine less work to do, which means faster performance.

    There is a lengthy list of caveats to that, of course (I alluded to a few in my earlier posts).  However, the basic concept is inviolate.  Kalman expresses it well when he says denormalization is "putting one fact in multiple places".   Another way to look at this kind of denormalization is that it's essentially precalculating a join result and realizing that result into one or more physical tables.  Saves time over doing it on the fly ... but can cause modification anomalies and other issues.

     


    Michael Asher
    Wednesday, October 27, 2010 2:05 AM
  • David,

    Valid point about multi-table constraints. How do you handle, as a database designer, when developers begging for relief from 10+ JOINs queries?


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM

    That's an easy one :-)  developers should never see, have access to, or care about your table structures. All they ever have access to are views and Stored Procedures written by a SQL professional.
    Dave Frommer - BI Architect Slalom Consulting (www.slalom.com)
    So true... Well said.
    Wednesday, October 27, 2010 2:17 AM
  •  developers should never see, have access to, or care about your table structures. All they ever have access to are views and Stored Procedures written by a SQL professional.
    What if you, the database designer, are the only SQL professional on the team and 10 developers with various level of SQL skills? Developers required to write their own stored procedures.
    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Wednesday, October 27, 2010 5:28 AM
    Moderator
  • Micheal:
    "It doesn't matter "how the data is stored", as you say.  The joins still must be done."
    "no matter how its implemented, you're still giving the engine less work to do, which means faster performance."

    But these statements are false because the implementation in physical storage could conceivably be identical in each case and therefore the work of retrieving them would be exactly the same. For instance suppose the join is already persisted in storage. Or suppose a columnar model is used where the work to retrieve attributes from separate tables is the same as from the same table. (I'm ignoring any small query cost associated with metadata changes because that work might be done at compile time rather than runtime).

    It doesn't make sense to talk about normalization/denormalization and performance without reference to the implementation in storage and the type of operations done on the data. The fact that people DO do this explains why there are just as many people claiming that normalization has improved performance for them as there are claiming that denormalization did the same. Obviously both groups are wrong!

     

    Wednesday, October 27, 2010 6:16 AM
  •  developers should never see, have access to, or care about your table structures. All they ever have access to are views and Stored Procedures written by a SQL professional.
    What if you, the database designer, are the only SQL professional on the team and 10 developers with various level of SQL skills? Developers required to write their own stored procedures.
    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM


    Change jobs? :-)

    I have had those situations, I pick 1 or 2 of the devs with the best SQL abilities and have them write the SPs. Then I check them thoroughly. I usually have to write some of the more difficult ones in any case.

    This is one area i do not compromise in anymore as I have seen too many issues over the 26 years I have been doing database work with allowing direct access to tables from code.


    Dave Frommer - BI Architect Slalom Consulting (www.slalom.com)
    Wednesday, October 27, 2010 10:21 AM
  • But these statements are false because the implementation in physical storage could conceivably be identical in each case and therefore the work of retrieving them would be exactly the same.

    No, it couldn't. It's physically impossible, for the example I gave ... and impossible in general for SQL Server.
    For instance suppose the join is already persisted in storage.

    a) We're talking about SQL Server here.

    b) Even assuming some hypothetical product capable of physically persisting long join chains, you *still* don't get an identical storage implementation. You may be thinking not of a physically persisted join, but of a persisted join index. However, even here, you couldn't get an identical implementation - there are more than two tables involved.

    Or suppose a columnar model is used where the work to retrieve attributes from separate tables is the same as from the same table. (I'm ignoring any small query cost associated with metadata changes because that work might be done at compile time rather than runtime).

     

    Columnar databases don't work the way you think they do. Retrieving an attribute functionally dependent on a superparent column requires walking the join chain. Not only will the physical implementation not be identical, but the query cost will be higher as well. Degenerate cases where this isn't true don't exist for long join chains.

    Try it yourself, using the example I gave, on any columnar database you wish.  If you flatten a six-way join into a logical row via a single-column denormalization and don't see a large performance gain, I'll happily eat my hat.  

    It doesn't make sense to talk about normalization/denormalization and performance without reference to the implementation in storage and the type of operations done on the data. The fact that people DO do this explains why there are just as many people claiming that normalization has improved performance for them as there are claiming that denormalization did the same.

    This is a specious argument. Performance is a practical trait, not a theoretical one. If one denormalizes a database and performance doubles as a result -- the denormalization improved performance.  Here in the real-world, denormalization is an effective performance booster. So much so, in fact, that the TPC benchmark suite is specifically optimized to prevent transparent denormalization.

     


    Michael Asher
    Wednesday, October 27, 2010 11:46 AM
  •  If you flatten a six-way join into a logical row via a single-column denormalization and don't see a large performance gain, I'll happily eat my hat.  

    .... and I eat my fish sandwich gladly!

    If you put all the colums into a single table row/record, naturally performance gain follows. Very similar concept to covering index ( http://sqlusa.com/bestpractices/coveringindex/  ) which keeps all data in leaf records for a particular query.

    But even covering index is only recommended as a sure-fire, stable workaround.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Wednesday, October 27, 2010 3:41 PM
    Moderator
  •  If you flatten a six-way join into a logical row via a single-column denormalization and don't see a large performance gain, I'll happily eat my hat.  

    .... and I eat my fish sandwich gladly!

    If you put all the colums into a single table row/record, naturally performance gain follows. Very similar concept to covering index ( http://sqlusa.com/bestpractices/coveringindex/  ) which keeps all data in leaf records.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM

    I do want to add the caveat that over-extensive denormalization can hurt performance.  You're widening your rows, after all, and especially in the case of a query that doesn't require the columns you've added, you can see a small hit.

    Of course, targeted denormalization isn't about squeezing the enter master row into every child.  It's about addressing performance-robbing deep joins that exist only to locate a single column, or small subset of columns.  In this case, no modern DBMS is going to see anything but a large gain from such denormalization.

     


    Michael Asher
    Wednesday, October 27, 2010 3:52 PM
  • The following AdventureWorks example shows that denormalized table is somewhat faster for a particular query:

    -- Create two normalized (City, normAddress) tables & a denormalized table Address
    SELECT AddressID
       ,AddressLine1
       ,AddressLine2
       ,City
       ,StateProvinceID
       ,PostalCode
       ,rowguid
       ,ModifiedDate
     INTO Address
     FROM AdventureWorks.Person.Address
    
    SELECT CityID = ROW_NUMBER() OVER ( Order By City), City
      INTO City
      FROM AdventureWorks.Person.Address
      GROUP BY City
    ALTER TABLE City ALTER COLUMN CityID INT NOT NULL
     
     
    SELECT AddressID
       ,AddressLine1
       ,AddressLine2
       ,(SELECT CityID FROM City Where a.City = City) as CityID 
       ,StateProvinceID
       ,PostalCode
       ,rowguid
       ,ModifiedDate
     INTO normAddress
     FROM AdventureWorks.Person.Address a
     
    ALTER TABLE Address ADD CONSTRAINT pkAddress PRIMARY KEY (AddressID)
    ALTER TABLE normAddress ADD CONSTRAINT pknormAddress PRIMARY KEY (AddressID)
    ALTER TABLE City ADD CONSTRAINT pkCity PRIMARY KEY (CityID)
    ALTER TABLE normAddress ADD CONSTRAINT fknormAddress FOREIGN KEY ( CityID) REFERENCES City(CityID)
    
    /******************************************************/
    -- Denormalized relative cost 43%
    SELECT City, CityCount = COUNT(*)
    FROM Address
    WHERE City like '%b%'
    GROUP BY City ORDER BY CityCount DESC
    -- (117 row(s) affected)
    
    -- Normalized relative cost 57%
    SELECT City, CityCount = COUNT(*)
    FROM normAddress a
     INNER JOIN City c
      ON a.CityID = c.CityID
    WHERE c.City like '%b%'
    GROUP BY City ORDER BY CityCount DESC
    -- (117 row(s) affected)
    /******************************************************/
    
    GO -- Cleanup
    DROP TABLE Address
    DROP TABLE normAddress
    DROP TABLE City

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM

    Wednesday, October 27, 2010 8:59 PM
    Moderator
  • Kalman, in considering the case for denormalization, there are sometimes reasons to do it besides simply to increase performance.    The article below lists some examples of when denormalization can pay off in other ways:

    http://www.sql-server-performance.com/articles/dev/denormalization_profit_p1.aspx

     


    Michael Asher
    Thursday, November 18, 2010 6:15 AM
  • Thanks Michael.

    Quoted parts from the above referenced sql-server-perfomance.com article:

    "Denormalization in SQL Server for Fun and Profit

    Almost from birth, database developers are taught that their databases must be normalized.  In many shops, failing to fully normalize can result in anything from public ridicule to exile to the company’s Siberian office.  Rarely discussed are the significant benefits that can accrue from intentionally denormalizing portions of a database schema.  Myths about denormalization abound, such as:

    •A normalized schema is always more stable and maintainable than a denormalized one.
    •The only benefit of denormalization is increased performance.
    •The performance increases from denormalization aren’t worth the drawbacks."

    Siberia goes the other way to: Alaska! If you try to introduce 3NF design techniques in a messy db shop, you may even get fired! Unless you have very good corporate (tiny to large) diplomatic skills, don't even try convert them to 3NF, just design so that you "blend into the team", and you will get your paycheck. 3NF design can be a very emotional and political issue in "mixed" teams.

    My biggest problem with "denormalized" is that nobody is willing to give a rule: where is the delineation between denomalized and absolute mess database design? 5%? 40%

    I support the following tiny departure from 3NF design: 1-to-1 vertical partitioning of frequently used columns/attributes from infrequently used columns.

     

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Thursday, November 18, 2010 3:40 PM
    Moderator
  • nobody is willing to give a rule: where is the delineation between denomalized and absolute mess database design? 5%? 40%

    I think you're looking at it the wrong way.  As compelling as the benefits of denormalization are, normalization should always be the default.  Normalize fully, then denormalize only to solve specific problems.   The article I linked gave some examples of why you'd denormalize to meet business rule requirements.   Performance is another main reason.  But you never denormalize just to "make things faster" for the sheer joy of it.  You'd do so only to solve specific performance problems, e.g. "this query is running in 60 seconds and the business requirements say it must take less than 10".

     


    Michael Asher
    Thursday, November 18, 2010 5:37 PM
  • Thanks Michael. How about denormalizing because developers are tired of too many JOINs?

    Denormalization related Technet Article:

    Optimizing the Database Design by Denormalizing

    You implement normalization with a specific purpose: to maintain data integrity. However, in a real-life project, you typically have to bring back some data redundancy either for performance reasons or to maintain a history.

    ....


    Still, denormalization brings the danger of update anomalies back to the database. Therefore, you have to do it deliberately. You should document any denormalization thoroughly. To make sure your application correctly maintains denormalized data, you need to use transactions appropriately. A transaction is the smallest unit of work that must either complete entirely or not at all. For example, in the Invoices system, a transaction might mean an insertion to a table such as InvoiceDetails, which must be followed immediately by an update of the derived-level column TotalInStock in the Product table or a StockLevel column in the ProductsIn-Warehouses table, depending where you maintain the stock level information. If one of those actions fails, the entire transaction is rolled back to ensure data consistency. An even better choice for maintaining denormalized data, however, is to leave that task to the RDBMS. You can do this by introducing data manipulation language (DML) triggers. A RDBMS fires a trigger automatically as part of a transaction. In the invoices example, a DML trigger for insert, update, and delete on the InvoiceDetails table can maintain the TotalInStock column and the ProductsInWarehouses table. In addition, you should have procedures in place to rebuild the derived data from scratch. You can always rebuild this data from events tables in case of inconsistency between the sum of events and states.

    http://technet.microsoft.com/en-us/library/cc505841.aspx

     

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Thursday, November 18, 2010 7:58 PM
    Moderator
  • An even better choice for maintaining denormalized data, however, is to leave that task to the RDBMS. You can do this by introducing data manipulation language (DML) triggers...


    That's a good article.  Also, the earlier one I posted gives some examples for using not only triggers, but constraints to prevent anomaly introduction into a denormalized database.

     

    How about denormalizing because developers are tired of too many JOINs?

     

    That's why god created views :-)


    Michael Asher
    Friday, November 19, 2010 3:23 AM
  • •A normalized schema is always more stable and maintainable than a denormalized one.
    •The only benefit of denormalization is increased performance.
    •The performance increases from denormalization aren’t worth the drawbacks."


    I disagree with the 2nd item, and complete disagree with the 3rd item.

    The 2nd item seems to suggest that denormalizing will guarantee better performance, which is not true. I agree with others that normalization should be the default, and denomalization just a tool to improve performance where possible and necessary.

    Having said that, it is clear that denormalization can sometimes increase performance. There are so many ways in which one can denormalize, that the statement that "the perfomance increases arent' worth the drawbacks" is simply silly.

    A very good example was given in this thread, where an id is pushed down in a "tree".

    Another example is where a table is split up in two (with a one-to-one relation), which can be very efficient when many queries have to do a table scan but will never access certain large columns.

    Obviously, you should only do such things if the normalized system is causing performance problems, or you can reliably predict that it will.

    Many types of denormalization that were done in the past are not necessary anymore; if you run the newest and most expensive edition, then indexed views, partitioned tables, filtered indexes, etc. can be used to achieve better performance, without the need to denormalize certain situations.

    I think that in this area, SQL Server can still be improved. That way, we need to denormalize less and less, and don't often have to write convoluted queries anymore to get the best performance. You can find an example of a situation that could "easily" be improved http://www.xs4all.nl/~gertjans/sql/example1/top-with-left-join.html

    Also, features like indexed views could be massively improved, if they are redesigned to support subqueries, outer joins, etc. That would be an interested project to be on...

    --

    Gert-Jan

    Friday, November 19, 2010 6:29 PM
  • Many types of denormalization that were done in the past are not necessary anymore; if you run the newest and most expensive edition, then indexed views, partitioned tables, filtered indexes, etc. can be used to achieve better performance, without the need to denormalize certain situations.

    Agreed. From a performance perspective, sure, these can often substitute for denormalization.

    The purpose of that article, however, was to illustrate some of the reasons you would want to denormalize other than increased performance.

     

    There are so many ways in which one can denormalize, that the statement that "the perfomance increases arent' worth the drawbacks" is simply silly.
    The article gives that as an example of a myth about denormalization, not a statement of fact.
    Michael Asher
    Friday, November 19, 2010 8:44 PM