none
Time Series design

    Question

  • I have a table with the following data:

    StoreNumber, SalesDate, ProductID, SizeID, Quantity

    This table contains data for the last 3 years.  The primary key on the tables is StoreNumber, SalesDate, ProductID, SizeID.  The sales date can be broken down to hours if I choose, but at the moment, I'm only concerned about the day level.  StoreNumber, ProductID, and SizeID are all integers and all are discrete values.  What I want to be able to do is the following:  Predict the quantity of each size of product sold for a given store for 1, 3, 5, 7, and 14 days into the future.

    Stores can sell as many as 350 different product/size combinations in any given day.  I want to be able to predict an entire days sales - all possible products predicted to be sold and the quantities of each.

    I also need to be able to pull the results of the prediction into a SQL Server table that I can then utilize with additional data to derive information to be delivered to the supply chain.

    How do I go about setting up the mining model and then how do I go about dumping the results into a table in SQL Server?


    Mike Hotek mhotek@mssqlserver.com
    Friday, December 17, 2010 4:52 PM

Answers

  • Hi Michael -- 

    I will help whittle away toward a solution.  Actually I have seven answers, though I don't know my response would be marked as a single answer since this thread has several answers already indicated.

    1) What are the key, input, and predict variables? 

    From http://msdn.microsoft.com/en-us/library/ms174923.aspx

    The requirements for a time series model are as follows:

    • A single key time column    Each model must contain one numeric or date column that is used as the case series, which defines the time slices that the model will use. The data type for the key time column can be either a datetime data type or a numeric data type. However, the column must contain continuous values, and the values must be unique for each series. The case series for a time series model cannot be stored in two columns, such as a Year column and a Month column.

    • A predictable column    Each model must contain at least one predictable column around which the algorithm will build the time series model. The data type of the predictable column must have continuous values. For example, you can predict how numeric attributes, such as income, sales, or temperature, change over time. However, you cannot use a column that contains discrete values, such as purchasing status or level of education, as the predictable column.

    • An optional series key column    Each model can have an additional key column that contains unique values that identify a series. The optional series key column must contain unique values. For example, a single model can contain sales for many product models, as long as there is only one record for each product name for every time slice.

    In your case, the key time is the day (assuming single day), the predictable column is the quantity, and the optional series key column could be used (though I would instead apply a filter to the data).

    2) Should 3,500 models be created? 

    The answer is it depends.  If you have a client willing to pay for all that work, then the answer is yes.  I once tracked home improvement stores for GE Capital, and they wanted individual and aggregated results for several hundred home improvement stores.  Part of the advantage of thinking about data mining is working toward sets, and in your case, a more feasible option would be to make fewer models than 3,500 but more than one.

    3) If you want to do fewer than 3,500 stores then how could you proceed?

    One way is to cluster them by geography, or management hierarchy.  On marktab.net , I promote data mining for actionable business decisions, and the question first is what type of decision are you trying to make?  In your problem, the goal is a quantity, but for what purpose?  To reallocate by geography?  To reallocate management?  If there is no generic purpose, you can use clustering to choose the store groupings for you based on empirical evidence, and then proceed to combine store results together.

    4) Should all sales for a store be combined in a single predictive model?

    Again, the answer is it depends.  Highly correlated sales may indicate interaction.  However, in a general problem, you might discover correlation among days and among sales types.  In absence of any empirical reason, the sales should remain together.

    5) Should products be clustered into a single product type?

    Perhaps someone was expecting single product output for single stores.  You may not have enough evidence based on specific SKUs (product IDs) to make an assessment.  Microsoft has a Contoso Retail dataset, and when I have used it for data mining presentations, I note that the products listed may not be of specific interest in terms of prediction.  For example, does someone care (in light of actionable business decisions) if a mouse pad sale was for a dark or light green mousepad (two separate products).  Perhaps the products can be combined.

    6) How do you put the results of time series into a table?

    The answer is at:  http://msdn.microsoft.com/en-us/library/bb677208.aspx

    7) What do-it-yourself resources are available beyond people who consult for a living (such as marktab.com )?

    First, I hope you are intending to use all the algorithms.  I have talked about this topic a lot on marktab.net, and have been encouraging people to know what all the algorithms do, and use all the tools including Integration Services to maximize processing efficiency.  SSIS is my recommended place to generate many models -- abstract the filtering criteria (such as store or group of stores) and use SSIS to iterate through the combinations and even through querying the data mining models.  In this reply, I have said you can combine Time Series with Clustering, and I am also saying use SSIS and SSAS.

    I like Jamie MacLellan's book so much, I did a chapter-by-chapter interview:

    http://www.marktab.net/datamining/index.php/2010/05/29/introduction-to-data-mining-in-sql-server-2008/

    However, I am also recommending another book which helps provide insight into how to frame data mining concepts, called Data Mining for Business Intelligence:

    http://www.marktab.net/datamining/index.php/2010/11/05/data-mining-for-business-intelligence-book-review/

    And you can read my interview with that book's author Galit Shmueli:

    http://www.marktab.net/datamining/index.php/2010/11/17/galit-shmueli-interview-lead-author-of-data-mining-for-business-intelligence/

     

    Wednesday, January 05, 2011 5:58 AM

All replies

  • The answer to this question is really to go out and read some literature on the SQL Server Data Mining platform, inlcuding the Time Series algorithm and DMX.  Data Mining with SQL Server 2008 is the only book that would fill this bill.

    Once you've done that, there are still some specific issues regarding this project.  It's not really something that can be answered in a simple forum post.

    That being said, I would make seperate models for each store on the data rolled up by day and product.   Your resulting data will be Store/Day/Product/Sales Amount.  Divide the data by store before feeding to the algorithm.

     The default parameters for the Time Series algorithm should be fine, except I would add periodicity hints of 7 and 29 to indicate weekly and monthly patterns (29 is an approximation that will likely work well). 

    After you build the models, you can use DMX to predict out as many days as you wish.  You can use SQL Server Integration Services or a linked server and DMX to get the results back into SQL Tables.

    My guesstimate is that if you know how to do all the parts, it will take you 2-4 weeks to get this set up.  If not, it will take a bit longer.

     

    -Jamie


    ______________________________
    Jamie MacLennan
    CTO
    Predixion Software, Inc.
    My Blog
    Follow on Twitter
    Email Me
    Monday, December 20, 2010 4:03 AM
  • I will just add another comment. Since you want to forecast the sales on product level per day, then it is (often) essential to include the price as a predictor.

    Typically up to 70% of the variation in sales is due to price variations, and as price variations are planned then they a easy to forecast.

    You if just go for a standard ARIMA implementation I guess you will get a very poor prediction.

    Best Regards

    Kim Lillesøe

    Dataminds

     

    Tuesday, December 21, 2010 1:47 PM
  • I've been through that book, multiple times, and I'm just not getting anything useful.  I've also been through every article on your website, hundreds of posts on this forum, hundreds of articles on other sites, and I'm just not finding anything that points me to producing a useable model, for even 1 store, let alone 3500 stores. I'm still stuck on which columns to specify for key, input, and predict to even be able to get meaningful data back out.  So far, everything I've tried has produced garbage.  

     

    I was also hoping to be able to avoid creating a separate model for each store, because that would mean the creation of approximately 3500 models.  While the creation of the models isn't an issue, since I can easily write a code generator in T-SQL to dump out the DMX required to create the models, processing them would be an entirely different matter.


    Mike Hotek mhotek@mssqlserver.com
    Tuesday, January 04, 2011 3:57 PM
  • It seems to me like you have skipped the design and want the model to be able to predict future sales only based on historic sales.

    If you what to proceed this way you might as well go for some simple model - smoothing technics. The math for these models are pretty simpel and can be programmed in SQL -> close to zero processing time.

    However if you add some design to your setup, then I think you will geta better model. Evaluate the relevante drivers price/season and estimate seperate models based on a regression technic. Price is typically something you plan and therefore it is ideal to use in forecasting.

    I doubt the ARIMA model will perform better than the simple model in this case.

    I hope this helps.

    Best regards

    Kim Lillesøe

    Dataminds

     

     

     

    Tuesday, January 04, 2011 10:41 PM
  • Hi,

    your question is a reality problem, the similarity things exist in data mining everywhere. To do a relative good analysis, we should understand data structure except analysis subject.

    From your question, I think, you could set a model for each store by using arima method,thus, you may set 3500 models, the task is horrible.  Using Filter parameter, you could select storeid.

    The other method is to set a general model for stores. This analysis method is called Panel Data Analysis, MS calls it as ARTXP, this is to say , you could use some time serial data as input, and use storeid to mark each store.  (Maybe , that I name ARTXP as Panel Data Analysis is not correct, I don't get the ARTXP method details).

    You may try the methods listed before. If you don't get a reasonable result, you may try SAS PDLREG procedure, which will give you enough information about the method.

     

    Rockfiller

     

     


    Rockfiller
    Wednesday, January 05, 2011 1:19 AM
  • Hi Michael -- 

    I will help whittle away toward a solution.  Actually I have seven answers, though I don't know my response would be marked as a single answer since this thread has several answers already indicated.

    1) What are the key, input, and predict variables? 

    From http://msdn.microsoft.com/en-us/library/ms174923.aspx

    The requirements for a time series model are as follows:

    • A single key time column    Each model must contain one numeric or date column that is used as the case series, which defines the time slices that the model will use. The data type for the key time column can be either a datetime data type or a numeric data type. However, the column must contain continuous values, and the values must be unique for each series. The case series for a time series model cannot be stored in two columns, such as a Year column and a Month column.

    • A predictable column    Each model must contain at least one predictable column around which the algorithm will build the time series model. The data type of the predictable column must have continuous values. For example, you can predict how numeric attributes, such as income, sales, or temperature, change over time. However, you cannot use a column that contains discrete values, such as purchasing status or level of education, as the predictable column.

    • An optional series key column    Each model can have an additional key column that contains unique values that identify a series. The optional series key column must contain unique values. For example, a single model can contain sales for many product models, as long as there is only one record for each product name for every time slice.

    In your case, the key time is the day (assuming single day), the predictable column is the quantity, and the optional series key column could be used (though I would instead apply a filter to the data).

    2) Should 3,500 models be created? 

    The answer is it depends.  If you have a client willing to pay for all that work, then the answer is yes.  I once tracked home improvement stores for GE Capital, and they wanted individual and aggregated results for several hundred home improvement stores.  Part of the advantage of thinking about data mining is working toward sets, and in your case, a more feasible option would be to make fewer models than 3,500 but more than one.

    3) If you want to do fewer than 3,500 stores then how could you proceed?

    One way is to cluster them by geography, or management hierarchy.  On marktab.net , I promote data mining for actionable business decisions, and the question first is what type of decision are you trying to make?  In your problem, the goal is a quantity, but for what purpose?  To reallocate by geography?  To reallocate management?  If there is no generic purpose, you can use clustering to choose the store groupings for you based on empirical evidence, and then proceed to combine store results together.

    4) Should all sales for a store be combined in a single predictive model?

    Again, the answer is it depends.  Highly correlated sales may indicate interaction.  However, in a general problem, you might discover correlation among days and among sales types.  In absence of any empirical reason, the sales should remain together.

    5) Should products be clustered into a single product type?

    Perhaps someone was expecting single product output for single stores.  You may not have enough evidence based on specific SKUs (product IDs) to make an assessment.  Microsoft has a Contoso Retail dataset, and when I have used it for data mining presentations, I note that the products listed may not be of specific interest in terms of prediction.  For example, does someone care (in light of actionable business decisions) if a mouse pad sale was for a dark or light green mousepad (two separate products).  Perhaps the products can be combined.

    6) How do you put the results of time series into a table?

    The answer is at:  http://msdn.microsoft.com/en-us/library/bb677208.aspx

    7) What do-it-yourself resources are available beyond people who consult for a living (such as marktab.com )?

    First, I hope you are intending to use all the algorithms.  I have talked about this topic a lot on marktab.net, and have been encouraging people to know what all the algorithms do, and use all the tools including Integration Services to maximize processing efficiency.  SSIS is my recommended place to generate many models -- abstract the filtering criteria (such as store or group of stores) and use SSIS to iterate through the combinations and even through querying the data mining models.  In this reply, I have said you can combine Time Series with Clustering, and I am also saying use SSIS and SSAS.

    I like Jamie MacLellan's book so much, I did a chapter-by-chapter interview:

    http://www.marktab.net/datamining/index.php/2010/05/29/introduction-to-data-mining-in-sql-server-2008/

    However, I am also recommending another book which helps provide insight into how to frame data mining concepts, called Data Mining for Business Intelligence:

    http://www.marktab.net/datamining/index.php/2010/11/05/data-mining-for-business-intelligence-book-review/

    And you can read my interview with that book's author Galit Shmueli:

    http://www.marktab.net/datamining/index.php/2010/11/17/galit-shmueli-interview-lead-author-of-data-mining-for-business-intelligence/

     

    Wednesday, January 05, 2011 5:58 AM