locked
How to store statistics data? RRS feed

  • Question

  • Hi

    Does anyone here have suggestions on how to store statistics data?

    We want to store statistics about our users.

    Not just the total statistics, but they should be able to combine different attributes.

    For example.
    "number of users in new york between 2 april  and 3 april 2010"
    "number of users between 40-50 years old from new york with a photo between 1 april and 1 may 2010"
    "number of users that is 47 years old from miami dallas or that does not have a photo between 10 may and 20 may 2010"

    As you can see there are several attributes, such as city, age, photo. And of course the number of users with those attributes for a specific time period.
    What are the best practices here?

    • Changed type KJian_ Monday, November 22, 2010 2:16 AM
    Sunday, November 21, 2010 7:06 PM

Answers

  • It sounds like you have slowly varying time series data, and you want to preserve past history in a way that allows you to later generate statistics from it?  The 'daily snapshot' method is the simplest, but will probably lead to excessive amounts of data being generated, as you'll need a new row for each user for each day that passes (or more, if your time interval is smaller).

    The typical way you store such time series is by creating a new row only when an attribute changes, with the start and stop dates attached to each row, indicating when the attribute set is valid.   For instance, if a user moves from Dallas to NYC on May 1:

     

    USER01 Dallas   April 1 2007  April 30 2010

    USER01 New York City  May 01 2010 Nov 21 2010

    You have to create a new entry every time *any* attribute changes, so if they didn't have a photo on July 1 but did on July 2, you need a new row.  Age continually changes, but its best to calculate that based simply off (fixed attribute) birthdate, rather than storing current age as a slowly-changing attribute.

     


    Michael Asher
    • Marked as answer by KJian_ Friday, November 26, 2010 7:38 AM
    Sunday, November 21, 2010 8:33 PM
  • This is well-worked out topic in Data Warehousing. Check out the following link. Let us know if helpful.

     Slowly changing dimension

     Also checkout the EmployeeDepartmentHistory table in AdventureWorks2008 sample database. 

     

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM

     

    • Marked as answer by KJian_ Friday, November 26, 2010 7:38 AM
    Thursday, November 25, 2010 6:40 PM

All replies

  • I want to store historical data for each day. It's like snapshots of our current data at the same time every day.

    I dont understand how views would help me here?

    Sunday, November 21, 2010 8:05 PM
  • One user may live in miami one day (for example 2 april 2010). But move to dallas on april 3.

    This means that if we need to count numbers of users in miami it should be 1 for the 2 april, but 0 on april 3.

    Sunday, November 21, 2010 8:25 PM
  • It sounds like you have slowly varying time series data, and you want to preserve past history in a way that allows you to later generate statistics from it?  The 'daily snapshot' method is the simplest, but will probably lead to excessive amounts of data being generated, as you'll need a new row for each user for each day that passes (or more, if your time interval is smaller).

    The typical way you store such time series is by creating a new row only when an attribute changes, with the start and stop dates attached to each row, indicating when the attribute set is valid.   For instance, if a user moves from Dallas to NYC on May 1:

     

    USER01 Dallas   April 1 2007  April 30 2010

    USER01 New York City  May 01 2010 Nov 21 2010

    You have to create a new entry every time *any* attribute changes, so if they didn't have a photo on July 1 but did on July 2, you need a new row.  Age continually changes, but its best to calculate that based simply off (fixed attribute) birthdate, rather than storing current age as a slowly-changing attribute.

     


    Michael Asher
    • Marked as answer by KJian_ Friday, November 26, 2010 7:38 AM
    Sunday, November 21, 2010 8:33 PM
  • ok, thank you!
    Sunday, November 21, 2010 8:38 PM
  • This is well-worked out topic in Data Warehousing. Check out the following link. Let us know if helpful.

     Slowly changing dimension

     Also checkout the EmployeeDepartmentHistory table in AdventureWorks2008 sample database. 

     

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM

     

    • Marked as answer by KJian_ Friday, November 26, 2010 7:38 AM
    Thursday, November 25, 2010 6:40 PM
  • This a great dimensional design example. Dimensions are Time, Location, User. Fact table will be the measurements...your statistics. Another good dimension is the user information related to their sessions, logins, time elaspsed, click thoughs etc etc
    Friday, November 26, 2010 8:46 PM