none
When is it appropriate to use the pivot statement?

    Question

  • I have a web application where a number of account variables are stored in an SQL 2005 database table. Currently I am storing basic account information such as name, address, and phone number. Furthermore, there are a few variables stored in the same table to determine a level of access an account has to the content in the web application. For instance account A has access to streaming video and PDF articles while account B only has access to the videos.

    I am about to add 2 more products (different types of videos and articles) and am wondering whether it is better to continue to add columns to the accounts table to grant permission to an account for a new product, or whether I should create an account settings table and use the pivot statement so I could simply add rows to the account settings table to grant an account access to another product.

    Has anyone thought through this before? I can see benefits both ways. My first thought regarding this is one of efficiency. This query is run very often. If we had 10 different products but most accounts only used 2 of them would it be better to return 10 columns every time someone logged in or would it be better to process the pivot and only return 2 columns. What if there were 100 products and most accounts only used 2 of them?

    Thanks

    Monday, June 29, 2009 4:19 PM

Answers

  • I agree with Plamen.  Normalizing the data will lead to a more scalable long term solution, considering you have to add columns every time a new media type comes out.  Concerning the pivot, you are thinking in terms of the application performance and not considering what is best for the database layer.  While pivot is a good function to have there can be a lot of overhead in using it. As Plamen stated, there is no such thing as a native dynamic pivot where the number of values is unknown, so you will have to use dynamic sql to pivot the data, which can make your database more vulnerable to sql injection attack.

     All-in-all what I am saying is a single column value may be best for your application, but it is not best for the database layer.  The database layer is more manageable with normalized tables and usually more performant when working with sets of data.I dont see a need to add extra overhead to your database layer and open your database to sql injection attack by pivoting the data.
    http://jahaines.blogspot.com/
    • Marked as answer by Jimmykang Tuesday, July 07, 2009 2:43 PM
    Monday, June 29, 2009 5:42 PM
    Moderator

All replies

  • You should normalize your design and use separate tables to store the access tokens (products) in one table and then those associated with specific account to another (separate from the table with accounts). There should be no need to pivot to check if account has access.


    Plamen Ratchev
    Monday, June 29, 2009 4:36 PM
    Moderator
  • Are you saying that each account access token should be it's own row? Meaning that I would have to loop through an account's access tokens to determine if an account has access to a specific product as opposed to checking a specific column of one returned row (if a pivot was done)?

    Or perhaps I am misunderstanding you.
    Monday, June 29, 2009 5:10 PM
  • Correct. You can return a result set of all access tokens for an account to the client and there validate access via a loop. You could pivot the data, but since you do not know how many products there could be you have to use dynamic pivoting, which currently requires some form of dynamic SQL to implement.
    Plamen Ratchev
    Monday, June 29, 2009 5:25 PM
    Moderator
  • I agree with Plamen.  Normalizing the data will lead to a more scalable long term solution, considering you have to add columns every time a new media type comes out.  Concerning the pivot, you are thinking in terms of the application performance and not considering what is best for the database layer.  While pivot is a good function to have there can be a lot of overhead in using it. As Plamen stated, there is no such thing as a native dynamic pivot where the number of values is unknown, so you will have to use dynamic sql to pivot the data, which can make your database more vulnerable to sql injection attack.

     All-in-all what I am saying is a single column value may be best for your application, but it is not best for the database layer.  The database layer is more manageable with normalized tables and usually more performant when working with sets of data.I dont see a need to add extra overhead to your database layer and open your database to sql injection attack by pivoting the data.
    http://jahaines.blogspot.com/
    • Marked as answer by Jimmykang Tuesday, July 07, 2009 2:43 PM
    Monday, June 29, 2009 5:42 PM
    Moderator
  • Plamen and Adam,

    I have done a bit more research to understand what you mean by normalize your design and I do not understand why normalizing the db design requires storing products in a separate table. Please forgive any ignorance. My knowledge of normalization comes from an undergrad course in databases almost 10 years ago and what I read in wikipedia just recently as a refresher.

    consider a table [Account]
    {accountid int, accountname varchar, ..., product1 bit, product2 bit, product3 bit, ...,product_N bit}

    It may break 1nf according to Chris Date who says that there should be no nullable columns (which seems to be controversial). However, other than that:
        1. There's no top-to-bottom ordering to the rows.
        2. There's no left-to-right ordering to the columns.
        3. There are no duplicate rows.
        4. Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
        5. All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].

    none of it's non-prime attributes (accountname-product_n) are functionally dependent on part of a candidate key.  So it is 2nf.

    no non-prime attributes are transitively dependent on the a candidate key so it is 3nf.

    for all of it's functional dependencies are x->y x is either a candidate key or a superset of that candidate key so it is in BCNF

    1. How does separating accounts and access tokens normalize the design?

    2. Does the MSDBE work more efficiently for tables normalized according to Chris Date's definition of 1nf?
    Friday, July 03, 2009 5:38 PM
  • This design "technically" may comply with normalization rules. But it is not a good approach. First, it has repeating group of columns with similar name. Next, it has a built-in table constraint - there could be only N products in the table. Under normalizing I did not meant to comply with normalization forms, but to common sense logical normalization of the data into separate tables. There could be many debates on what is good design, but to me makes a lot more sense to model this as Accounts {account_id, account_name}, Products (product_id, product_name}, and AccountProducts {account_id, product_id}.


    Plamen Ratchev
    Friday, July 03, 2009 8:31 PM
    Moderator
  • You "may" find this exchange useful: http://www.dbdebunk.com/page/page/622318.htm

    cheers,
    </wqw>
    Saturday, July 04, 2009 9:42 AM
  •  As Plamen stated, there is no such thing as a native dynamic pivot where the number of values is unknown, so you will have to use dynamic sql to pivot the data, which can make your database more vulnerable to sql injection attack.



    Reporting Services has native dynamic pivot capabilities, it is called Matrix Report.

    If in a 3NF design (as suggested above), a new product added, the Matrix Report would add a column automatically without any code change (assuming the product column is layed out across columns).
    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Saturday, July 04, 2009 5:08 PM
    Moderator
  •  As Plamen stated, there is no such thing as a native dynamic pivot where the number of values is unknown, so you will have to use dynamic sql to pivot the data, which can make your database more vulnerable to sql injection attack.



    Reporting Services has native dynamic pivot capabilities, it is called Matrix Report.

    If in a 3NF design (as suggested above), a new product added, the Matrix Report would add a column automatically without any code change (assuming the product column is layed out across columns).
    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com

    Okay, but how does a report help the OP?  The OP wants to create new columns for every new type of media that is created, oppose to storing the media type and access level in seperate tables.  The application currently expects the resultset to be returned with columns of data, not rows of data.  In this case the OP has no other choice than to use the SQL Server PIVOT function or use a cross tab query method.  I am sticking with my original post, that I believe the application will become more scalable, if the data is presented as rows in sepeate tables, not columns in a single table.
    http://jahaines.blogspot.com/
    Saturday, July 04, 2009 7:03 PM
    Moderator
  •   I am sticking with my original post, that I believe the application will become more scalable, if the data is presented as rows in sepeate tables, not columns in a single table.
    Relatonal database design 101! Lesson #1!

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Sunday, July 05, 2009 8:12 AM
    Moderator
  • Adam's response is obviously correct and answers the question. Plamen, your response was correct. However, you did not support what you were saying in a way that was clear to me and I was not ready to accept anyone's advice without understanding their perspective.

    1. Creating a seperate table would provide a more long term scalable solution. If there were a static number of products it could make sense to include them in the accounts table. However, because more products could be added at a later date it is more beneficial to create a seperate table.

    2. Pivoting the data would add extra overhead to the database layer and could leave the database open to an sql injection attack (if implemented dynamically) and is thus not worth any gains it would give the application layer.

    Thanks to everyone for their input.
    Tuesday, July 07, 2009 3:00 PM