locked
distinct count with multiple partitions? RRS feed

  • Question

  • We analyze much of our data using distinct measures.  Some of the underlying sources - i.e. daily page views are heavy and have millions of rows a day.  Because of this I've created daily partitions to only process the incremental data that's arrived.  However, I'm curious - how does distinct count perform when it needs to rollup over multiple partitions?

    IE Say it's Dec 25th, and I have 25 unique partitions for each day of December thus far.  Internally, how do the distinct measures correctly accumulate the unique instances of my measure?  Are there any significant performance concerns to be aware of? 
    Wednesday, January 3, 2007 10:01 PM

Answers

  • Distinct Count measure will work fine with multiple partitions. AS keeps the distinct count values inside the partitions, and therefore it can correctly aggregate across them. Partition per day should be OK too.
    Wednesday, January 3, 2007 11:31 PM
  • Arjun: Make sure you are on at least SQL 2005 SP1 QFE rollup, preferably SP2 CTP2.  There were a couple of bugs fixed for distinct count measures over multiple partitions.
    Thursday, January 4, 2007 12:04 AM

All replies

  • Distinct Count measure will work fine with multiple partitions. AS keeps the distinct count values inside the partitions, and therefore it can correctly aggregate across them. Partition per day should be OK too.
    Wednesday, January 3, 2007 11:31 PM
  • Arjun: Make sure you are on at least SQL 2005 SP1 QFE rollup, preferably SP2 CTP2.  There were a couple of bugs fixed for distinct count measures over multiple partitions.
    Thursday, January 4, 2007 12:04 AM
  • Good to know, thanks!  Jeff, do you have any details on the nature of the issues with distinct counts over multiple partitions?
    Thursday, January 4, 2007 11:05 PM