locked
Table Design in case of High data volume RRS feed

  • Question

  • We have a table Structure in our database which holds the Country, Section, Hourly Ad-request Data.

    Structure is similar to CountryID(int), SectionID(int), Hour(int), TotalRequest(bigint).

    On per day basis we are currently accumulating 75,000 new rows in the table.

    Now our client wants to break-down the data at City Level.

    Structure will be CityID(int), SectionID(int), Hour(int), TotalRequest(bigint).

    After analysis we found it will increase no. of rows 400 time/day basis. (ie. 30000000 new rows/day (approx.)

    So it is practically impossible to put it in a single table. What is the best possible solution in case of table design?

    We are using SQL Server 2008 Standard edition.

    Wednesday, October 31, 2012 12:30 PM

Answers

  • Hi PradiptaKGhosh.

    You need to use poor mans table partitioning, no need to buy enterprise edition.

    I'm assuming the CityID is based on a serious of values, e.g. 1-10, 11-20, 21-50 and so on.

    Basically create several tables for each split, e.g.

      CityGroupA   containing CityID(int), SectionID(int), Hour(int), TotalRequest(bigint)
      CityGroupB   containing CityID(int), SectionID(int), Hour(int), TotalRequest(bigint)
    and so on.
    You need to have a PRIMARY KEY on each table (CityID, SectionID, Hour) along with a CHECK constraint on CityID,  e.g.  CityID>=1 AND CityID<=10

    At this point you should have several tables each with Check Constraint and primary keys.

    Then create a partitioned view which basically contains a SELECT statement for each table.

    CREATE VIEW City
    AS
      Select CityID, SectionID, Hour,TotalRequest
      FROM CityGroupA 
      UNION ALL
      Select CityID, SectionID, Hour,TotalRequest
      FROM CityGroupB

    Insert/select records using the City view and SQL will use the correct table.


    Micatio Software Free IIS Azure Web Log App


    • Edited by Ron - Micatio Saturday, November 3, 2012 5:42 PM
    • Marked as answer by Maggie Luo Thursday, November 8, 2012 3:03 AM
    Saturday, November 3, 2012 5:39 PM

All replies

  • >>>>Now our client wants to break-down the data at City Level

    I see you cannot use partitioning , how do you know that number of rows will be increased, sorry I am missing something....



    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    MS SQL Consultants: Improves MS SQL Database Performance

    Wednesday, October 31, 2012 12:46 PM
  • Something doesn't seem right.  How do you differentiate today's rows from yesterday's rows? 

    In any event, the answer to your question is based on how you use the data and what the data represents.  Since you did not post that information, it is not possible to guess.  Partitioning is the logical answer - but, again, too little information to guess how that might be accomplished. BOL discusses partitioning and I suggest you start there.  Perhaps you are on the cusp of transitioning from a OLTP environment to a OLAP environment.

    If you really do plan on handling large volumes of data, then I suggest you hire someone with expertise in this area as it can be very complicated to design and support.  You need expertise at the database, OS, disk, and network levels to do this effectively - and 30 million rows per day will cause a rapid in size. 

    Wednesday, October 31, 2012 1:17 PM
  • Something doesn't seem right.  How do you differentiate today's rows from yesterday's rows? 

    Yes I have missed it,

    Current Table Structure is CountryID(int), SectionID(int), Request Date(DateTime), Hour(int), TotalRequest(bigint).

    Proposed Table Structure is CityID(int), SectionID(int), Request Date(DateTime), Hour(int), TotalRequest(bigint).

    We are Capturing data on per AdRequest. We are basically using GeoTargeting at Country, Region & CityLevel. Based on IP Information acquired from Client, we are able to find out the Country, Region & City Information.

    This data are going to shown in different reports.

    The problem is we are using SQL Server 2008 Standard edition. and client is reluctant to buy Enterprise edition.

    So I really require other viable alternative solution.

    Wednesday, October 31, 2012 2:33 PM
  • I see you cannot use partitioning , how do you know that number of rows will be increased, sorry I am missing something....

    A country has multiple Cities.

    So example row of of Current Scenario

    UK(Country),12311(SectionID),  2012-10-31(date), 20(hour),5000(Request)

    example. row of of Proposed Scenario

    UKCity1(City),12311(SectionID),  2012-10-31(date), 20(hour),500(Request)

    UKCity2(City),12311(SectionID),  2012-10-31(date), 20(hour),1500(Request)

    UKCity3(City),12311(SectionID),  2012-10-31(date), 20(hour),2000(Request)

    UKCity5(City),12311(SectionID),  2012-10-31(date), 20(hour),1000(Request)

    So same 5000 (Request) is broken by City Wise.

    So after analyzing the historical data we found an approx. increase of 400 times.

    Wednesday, October 31, 2012 2:42 PM
  • My suggestion remains.  Go hire the appropriate expertise.  A viable alternative will require an intimate knowledge of your data, how it is used, the resources you have or can provide, etc.  Partitioning may be helpful, but on what basis I cannot say. 

    One other issue you will need to examine is how (or if) you intend to migrate your existing data.  Seems your new structure drops CountryID.

    Wednesday, October 31, 2012 5:10 PM
  • Hi PradiptaKGhosh.

    You need to use poor mans table partitioning, no need to buy enterprise edition.

    I'm assuming the CityID is based on a serious of values, e.g. 1-10, 11-20, 21-50 and so on.

    Basically create several tables for each split, e.g.

      CityGroupA   containing CityID(int), SectionID(int), Hour(int), TotalRequest(bigint)
      CityGroupB   containing CityID(int), SectionID(int), Hour(int), TotalRequest(bigint)
    and so on.
    You need to have a PRIMARY KEY on each table (CityID, SectionID, Hour) along with a CHECK constraint on CityID,  e.g.  CityID>=1 AND CityID<=10

    At this point you should have several tables each with Check Constraint and primary keys.

    Then create a partitioned view which basically contains a SELECT statement for each table.

    CREATE VIEW City
    AS
      Select CityID, SectionID, Hour,TotalRequest
      FROM CityGroupA 
      UNION ALL
      Select CityID, SectionID, Hour,TotalRequest
      FROM CityGroupB

    Insert/select records using the City view and SQL will use the correct table.


    Micatio Software Free IIS Azure Web Log App


    • Edited by Ron - Micatio Saturday, November 3, 2012 5:42 PM
    • Marked as answer by Maggie Luo Thursday, November 8, 2012 3:03 AM
    Saturday, November 3, 2012 5:39 PM