none
Time Series only predicts one step

    Question

  • I have a relational table with daily sales for 364 days (52 week span) for 60 stores. I have created a Microsoft Time Series model using store as the case and the historical/actual line charts appear in the Charts viewer for each store. But only one prediction step is shown no matter how many prediction steps I select. I have tried this with an OLAP-based model and also tried a simple DMX query, all with the same result.

    Thank you in advance for any help with this.

    P Taylor

    Friday, September 15, 2006 1:21 PM

Answers

  • I asked Alexei Bocharov, the developer who implemented the TS algorithm and he came back with the following response:

    "Sorry for the delay.

     

    I played with this data and found it fairly volatile.

    I have also found that at complexity penalty 0.96 one gets a cruder model that does support long range prediction (but of course one gets larger StdDevs with that).

    It looks like below this complexity penalty the model is overfitting the volatile data which makes it unsuitable for the long-range prediction.

    In practice it is advisable to keep two models – one with complexity penalty  0.95 or lower for the first prediction, and one with complexity penalty 0.96 or higher for long range forecasting."

     

    Hope this helps.

    Tuesday, September 26, 2006 3:07 AM

All replies

  • Can you provide more information on your scenario?

    1. The schema for your relational table.

    2. The schema for the mining structure, including columns marked as key and key time

    Thanks

    Shuvro

    Saturday, September 16, 2006 1:48 AM
  • Shuvro,

    The relational table has 3 fields:

    FiscalDate - smalldatetime example: 2/20/2004 12:00:00 AM
    Store - nvarchar(50) example: "5114 - Mall of America"
    Sales: int   example: 29326 (these numbers are received as integers)

    Mining Structure:
    Fiscal Date (Key Time, type Date);               Store (Key, typeText);                        Sales (Continuous, type Long);

    Mining model (Microsoft Time Series):
    Fiscal Date (Key Time, type Date);               Store (Key, typeText);                
    Sales (Continuous, type Long, Predict and Regressor);

    Algorithm Parameters:
    MAXIMUM_SERIES_VALUE: 40000
    MINIMUM_SERIES_VALUE: 0
    MISSING_VALUE_SUBSTITUTION: Mean
    PERIODICITY_HINT  {7,364}

    Others: Default 

    The relational table actually has 728 days (=104 weeks or 2 fiscal years) across 60 stores. Periodicity is weekly (7) and by year (364)

    Many thanks and please let me know if you need additional information.

    Philip Taylor

     

     

                                                                  

    Monday, September 18, 2006 12:51 PM
  • Thanks for the information. Can you try the following:

    1. Set PERIODICITY_HINT setting to {7} and see if you get additional prediction steps?

    2. Set AUTO_DETECT_PERIODICITY to 0.8 and see if you get additional prediction steps?

    Thanks

    Shuvro

    Tuesday, September 19, 2006 12:06 AM
  • Hi Shuvro,

    I tried both of your recommended changes to Algorithm Parameters (individually and both together). Unfortunately the model still only yields one prediction step. I also tried AUTO_DETECT_PERIODICITY at 1.0. I am currently trying to compare my approach with the AdventureWorks data mining tutorial time series model (which does yield several prediction steps) to see what I am doing differently.

    Thank you for your help

    Philip Taylor

    Tuesday, September 19, 2006 1:37 PM
  • The nature of auto-regression trees is that they can generate unstable predictions.  Our algorithm alleviates this issue by truncating predictions that appear to be losing stability.  Our heuristic for this may be too aggressive and we are working on fixing it.  Is it possible for you to paste your series here so we can test with your data?

     

    Thanks

    Tuesday, September 19, 2006 4:04 PM
  • Jamie,

    I have e-mailed a CSV file to you, of data that I have been using.

    Many thanks

    Philip Taylor

    Tuesday, September 19, 2006 8:35 PM
  • I noticed that the trees have a very large number of splits which can lead to the instability of the algorithm.  You can control this by setting the COMPLEXITY_PENALTY and MINIMUM_SUPPORT parameters higher.

    That's something to try for now, we'll continue to look into the problem.

     

    Thanks

    -Jamie

    Wednesday, September 20, 2006 4:48 PM
  • I asked Alexei Bocharov, the developer who implemented the TS algorithm and he came back with the following response:

    "Sorry for the delay.

     

    I played with this data and found it fairly volatile.

    I have also found that at complexity penalty 0.96 one gets a cruder model that does support long range prediction (but of course one gets larger StdDevs with that).

    It looks like below this complexity penalty the model is overfitting the volatile data which makes it unsuitable for the long-range prediction.

    In practice it is advisable to keep two models – one with complexity penalty  0.95 or lower for the first prediction, and one with complexity penalty 0.96 or higher for long range forecasting."

     

    Hope this helps.

    Tuesday, September 26, 2006 3:07 AM
  • Jamie and Alexei,

    I did get more predictions with a compexity penalty of 0.96. I have some more data sets from additional groups of stores that I plan experimenting with.

    Many thanks for your help.

    Philip Taylor

     

    Tuesday, September 26, 2006 2:42 PM
  • I have a similar experience with time series that have seasonal behavior as well as "drastic" upward or downward trends in the last few observations. Is there a way to control the heuristic that truncates the predictions? Is this based solely on standard deviation?
    Wednesday, December 06, 2006 11:36 PM
  • Hi

    I'm trying to forecast a telecommunications traffic database. This database is made of 9476 independent measuring entities. Each of this entities measures 11 types of data.
    In my first exploration of the data, i'm trying to forecast just one of the measures (the most important) in some of the entities.
    But in most of the entities, i have the same problem stated in this thread: I can just predict one step!

    The Mining Model is quite simple:
    TIME_KEY (Key Time, Double);   EX: 2005050406000003600 represents 2005-05-04 06:00:00 measuring 03600 seconds
    M50183 (Continuous, Double);  EX: 1345

    TIME KEY in this database is always hourly, and the training set has 5684 records, corresponding to 5684 sequential hours.
    Here is an example of one of the datasets: http://student.dei.uc.pt/~pcoliv/entity68923.csv
    And a print screen of a 100 prediction steps forecast: http://student.dei.uc.pt/~pcoliv/entity68923.PNG
    I've got always the same result, even by by setting the COMPLEXITY_PENALTY and MINIMUM_SUPPORT parameters higher as recommended. In some cases, i've managed to reduce an ART from 6 levels to just 1, and the results were the same.

    If someone got a clue of what's happening, i've be glad to heard it :)

    Thanks in advance,
    Pedro

    Friday, February 09, 2007 1:05 PM
  • I had a similar problem and find out that setting MINIMUM_SUPPORT = 5 fixed this problem in my case. The thing is that the model I made had too few cases so I lowered this parameter and the mining model started forecasting subsecuents steps.

    Hope this helps.
    Monday, February 19, 2007 8:54 PM
  • When I attempt to alter the COMPLEXITY_PENALTY algorithm parameter thru DMX or XMLA I get the following error:

     

    Error (Data mining): The 'COMPLEXITY_PENALTY' data mining parameter is not valid for the 'time' model.

     

    Is it not possible to alter that parameter from script code?
    Wednesday, June 20, 2007 6:08 PM
  • Are you using standard or enterprise edition of SQL Server 2005?
    Friday, June 22, 2007 3:09 PM
  • I'm using Standard edition with SP2.  I'm guessing from your question that those algorithm parameters are an Enterprise edition feature?  

     

    Another algorithm parameter issue - I cannot set the PERIODICITY_HINT parameter (for Time Series algorithm) in DMX, but I can set it in XMLA or from BIDS.  When I attenmpt to set it in DMX, I get:

     

    Query (11, 20) Parser: The syntax for '{' is incorrect.

    Friday, June 22, 2007 4:58 PM
  • you must use Developer Edittion
    Monday, February 23, 2009 9:12 AM
  • Can you specify the exact DMX statement?  If it were a version issue you would get an error stating the parameter isn't supported, not a syntax error.
    -- Jamie MacLennan -- SQL Server Data Mining -- http://blogs.msdn.com/jamiemac
    • Proposed as answer by Scorpion_it Wednesday, April 01, 2009 7:13 AM
    Tuesday, March 03, 2009 7:14 PM
  • Hi Jamie MacLennan !
    I have some another problem , I'm reseaching about The Microsoft Naive Bayes algorithm rely on your Data Mining  with SQL server 2005 , can you explain for me?. Is there any difference from your idea:
    "Let’s use the congressional records as an example to build up Bayes’ rule.

    First,  suppose  that  you  had  to  simply  guess  the  party  affiliation  of  a  con-

    gressperson  during  the  2002  congressional  sessions  without  any  additional

    information .

         First, you tackle the probability of the hypothesis given the evidence; in this

    case, this would be the probability that the representative is a Democrat given

    that she voted Yeah on the Death Tax Repeal, Help America Vote, and Child

    Abduction  Acts,  and  Nay  on  the  Homeland  Security  Act.  To  determine  this

    probability, you need to compute the probability of the evidence, given that your

    hypothesis  is  true.  This  is  simply  a  lookup  from  the  counts  presented  in

    Table 4.1. That is, your evidence states that the representative voted Yeah on the

    Help America Vote Act and your hypothesis is that the representative is a Demo-

    crat. From the table, you see that the probability of this piece of evidence is 94%.

    The probability of all the evidence given the hypothesis is simply the product of

    the  probabilities  of  each  individual  piece.  Next,  you  multiply  by  the  overall

    probability, the prior probability, of your hypothesis    in this case 49%."

    and Document i have read  SQL Server Book Online (How the Algorithm Works  of Microsoft Naive Bayes Algorithm) :

          "The Microsoft Naive Bayes algorithm calculates the probability of every state of each input column, given each possible state of the predictable column."

    In my opinion, this means: The Microsoft Naive Bayes algorithm calculates the probability of hypothesis is voted Yeah on the Help America Vote Act given the evidence is a Democrat. And receive result same to in Microsoft Naive Bayes Viewer.

    Hope you respond the sooner the better. Thanks!!

    Wednesday, April 01, 2009 7:19 AM