none
Add additional address type to existing table

    Question

  • Suppose I have this table in live prod for many years:

    [UserContactInfo]
    [UserContactInfoId] uniqueidentifier not null
    [UserId] uniqueidentifier not null PK to [User] table
    [FirstName] varchar(50) null
    [LastName] varchar(50) null
    [Street] varchar(200) null
    [City] varchar(50) null
    [State] varchar(2) null
    [Zip] varchar(10) null
    many other fields

    Currently total max row byte count is currently only 1K -- far less than 8K per-row limit.

    Currently there are around 10 million rows or about 9GB.

    Let's say that data-migration is very painful due to the large table size , the necessity of running all operations in transactions , and the business negatives of extensive database maintenance window.

    Now I want to add the mailing address.

    Options 1 : I could either add additional columns:

    [MailingStreet] varchar(200) null
    [MailingCity] varchar(50) null
    [MailingState] varchar(2) null
    [MailingZip] varchar(10) null
    Option 2 : I could type the address:

    [AddressType] byte not null
    corresponding to a C# enum enforced on write

    enum AddressType {
    Physical=1,
    Mailing=2}
    and during db maintenance job run a script to update all existing rows to [AddressType]=1

    My question is which option will be better performing?

    Option 1 negatives:

    (a) initially all rows will have 4 unused columns and likely even after many years it will be low % of rows with actual data. But I believe SQL Server only requires a single bit to record that a col is null. (b) it moves the row slightly closer to the 8K boundary ( albeit still a long ways off ) (c) if we need another address type it means more cols

    Option 2 negatives:

    (a) potentially sometime in the future the table could have twice as many rows. I could add a compound index on [UserId],[AddressType] and I'm sure SQL Server has various tricks to optimize performance, but could I doubt it would ever approach the performance of a table half the size?

    Does anyone have better insight into the trade-offs of Option 1 vs Option 2?

    Monday, July 9, 2018 5:38 AM

All replies

  • 1. 10 millions of rows is not  considered big table nowadays

    2. I am against adding columns that not in use at all

    3. Consider having separated tables named AdressTypes and just have ID referenced to Contact Info table 

    4. In order to speed up the query you need properly defined indexes, so looks like the option 2 :-)))))


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Monday, July 9, 2018 5:59 AM
    Moderator
  • If you want maximum flexibility you can go for Option 2. Anytime you want you can add/remove/modify an AddressType by this approach. It doesnt require doing any changes on the table structure as such as all it requires would be an insert/update/delete operation to the table.

    Option 1 on the other hand would require you to keep on adding/removing columns from the table (DDL operations) for any changes in Address Type.

    Only downside I see in the case of Option 2 is the requirement to pivot from rows to columns when you want to show different addresstypes for an entity in a single resulset. But with operators like PIVOT available on T-SQL that wont be a difficult thing to achieve at this point in time ( you could even define a view to return the information in a flattened table structure if you want!)


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Monday, July 9, 2018 6:00 AM