locked
table partition RRS feed

  • Question

  • Hello,
    We are receiving around 100 million rows of data every hour in .csv files...
    Do you think it is a good idea to partition table based  on month for enhanced loading into table from the .csv files?

    thank you

    Saturday, September 3, 2016 4:33 PM

Answers

  • We are receiving around 100 million rows of data every hour in .csv files...
    Do you think it is a good idea to partition table based  on month for enhanced loading into table from the .csv files?

    Based on *month*?  That would put 75,000,000,000 rows into each *partition*.  How long is your planned (or current) retention period - one year?  More?  That is a LOT of rows.  Is this mostly archive storage that will hardly be queried, or is it going to be actively analyzed and reanalyzed?

    I'd think you would probably want to look at one partition per day, if you're going to use partitions at all ... if you are going to store all this data as rows in SQL Server at all.

    What are the chances you could instead store this data in some lob/blob-ish fashion, maybe 100 rows per hour with a million values stored as a big XML value?

    Josh

    • Marked as answer by arkiboys Sunday, September 18, 2016 8:42 PM
    Sunday, September 4, 2016 2:23 AM

All replies

  • Maybe. Depends on you query the data, how you age out old data etc. Also, is it one flat dimension or os there some master-detail into it? Generally, it is very difficult to answer such a question with that little input.

    But if you have 2.4 milliard of rows per day, that is quite a volume, and certainly partitioning comes into mind. But are those numbers real numbers you have today, or are these numbers projected for a system that is in the design phase?

    Saturday, September 3, 2016 5:23 PM
  • When you say enhanced loading, do you mean using SWITCH to load 100M rows at once from a staging table into a main table partitioned by hour?  Table partitioning may be a good idea in this case but a short schema modification lock will be needed during the operation, which cannot be granted while the table is queried.  Also, as Erland mentioned, the nature of queries is an important consideration.  More information will better help us help you.


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    Saturday, September 3, 2016 6:17 PM
  • We are receiving around 100 million rows of data every hour in .csv files...
    Do you think it is a good idea to partition table based  on month for enhanced loading into table from the .csv files?

    Based on *month*?  That would put 75,000,000,000 rows into each *partition*.  How long is your planned (or current) retention period - one year?  More?  That is a LOT of rows.  Is this mostly archive storage that will hardly be queried, or is it going to be actively analyzed and reanalyzed?

    I'd think you would probably want to look at one partition per day, if you're going to use partitions at all ... if you are going to store all this data as rows in SQL Server at all.

    What are the chances you could instead store this data in some lob/blob-ish fashion, maybe 100 rows per hour with a million values stored as a big XML value?

    Josh

    • Marked as answer by arkiboys Sunday, September 18, 2016 8:42 PM
    Sunday, September 4, 2016 2:23 AM