ARIMA-only Prediction yeilds wild results
-
26 iunie 2012 20:09
In evaluating the performance of SQL 2008 R2 time series versus the incumbent solution in use at my organization (Forecast Pro), I have identified some series for which the SQL ARIMA-only algorithm produces values that don't make sense to me, given the input data.
In the graph below, the series Corp Sales represents the actual values, FP Predicted Sales are the predictions made by Forecast Pro, and Predicted Sales are the predictions resulting from a DMX query (below) against a model with parameters in the screenshot below (all default expect for ARIMA, Previous, and {12}).
SELECT FLATTENED
'ARIMA' as ForecastName,[Product Key],
PredictTimeSeries([Sales Volume], 12)
From
[ARIMA]Both FP and SQL were given the same input data, starting from Jan. 2009, and both were asked to provide 12 predictions starting at June 2011. Many other series in our experiments have produced results very comparable to (or better than) FP, but many have produced results that appear "wild".
Is there some parameter, setting or other that I can tune in order to reduce this wild behaviour? Is there something obviously wrong in what I have done?
I am very eager to replace FP with SQL in our environment, but I want to do it by demonstrating superiority (after showing non-inferiority) of the predictions.
Thanks!
Toate mesajele
-
27 iunie 2012 06:50Membru care oferă răspunsuri
Can you share your data for this one series?
What is the reason you are using ARIMA only? Typically ArtXP is better in short term predictions, that is why MIXED is the default option.
Tatyana Yakushev [PredixionSoftware.com]
-
27 iunie 2012 14:20
Hi Tatyana,
I don't see how to attach a file, so I have pasted in the series at the bottom. We did not start out trying to use ARIMA only. We have many products that we are forecasting, and in comparing to Forecast Pro's performance on a per-product basis, we had found some products that FP predicted reasonably well, while the default FORECAST_METHOD of MIXED in SSAS provided unreasonable results. Working on the premise that FP is using some ARIMA imlpementation, we thought to try and compare "apples to apples" for the poorly predicted products. That's when we ran into the even stranger behaviour.
To ensure that I haven't done anything wrong, I'm proving the DMX that I used to create the structure and the competing mining models (ARIMA with a PERIODICITY_HINT of 12, and all defaults)
create mining structure [Test Corp Structure] ( [Surrogate Month Key] long key time, [Product Key] long key, [Sales Volume] double continuous ); alter mining structure [Test Corp Structure] add mining model [Test ARIMA 12] ( [Surrogate Month Key], [Product Key], [Sales Volume] predict ) USING Microsoft_Time_Series (PERIODICITY_HINT = '{12}', FORECAST_METHOD = 'ARIMA') WITH DRILLTHROUGH; alter mining structure [Test Corp Structure] add mining model [Test Mixed] ( [Surrogate Month Key], [Product Key], [Sales Volume] predict ) USING Microsoft_Time_Series WITH DRILLTHROUGH; insert into mining structure [Test Corp Structure] ( [Surrogate Month Key], [Product Key], [Sales Volume] ) openquery (/*My query that produces the list of values below*/) SELECT FLATTENED [Product Key], PredictTimeSeries([Sales Volume], 12) From [Test ARIMA 12] SELECT FLATTENED [Product Key], PredictTimeSeries([Sales Volume], 12) From [Test MIXED]My values for the odd series:
monthkey,productkey,sales 109,1,70906 110,1,66340 111,1,78049 112,1,71565 113,1,69945 114,1,73206 115,1,74389 116,1,70767 117,1,70851 118,1,72306 119,1,72463 120,1,77079 121,1,64717 122,1,61542 123,1,71682 124,1,64146 125,1,67581 126,1,65707 127,1,65940 128,1,68094 129,1,66094 130,1,65620 131,1,67657 132,1,71005 133,1,60907 134,1,57097 135,1,66175 136,1,56577 137,1,65257
In doing some reading, I found that ARIMA can take additional parameters:
http://social.msdn.microsoft.com/Forums/sa/sqldatamining/thread/a9c07842-f761-481e-9f8a-f2b2bc7d9e19
ARIMA_AR_ORDER
ARIMA_MA_ORDER
ARIMA_DIFFERENCE_ORDER
But in researching these, I found two Connect cases that made me wary of ARIMA in general:
https://connect.microsoft.com/SQLServer/feedback/details/567897/arima-scale-sensitive#details
Thanks for any suggestions that you can provide!
-
28 iunie 2012 15:55Membru care oferă răspunsuri
Did you train your model on points 109-125?
I tried to create the same model and I got different predictions.
I see that you don't have a lot of historical data. Usually, it is recommended to have at least 3 years of data (3*periodicity).
Because you only have little data, I recommend you to evaluate ARTXP model. I think you will get more accurate predictions with it.
If short term predictions with ARTXP are good but long term predictions are bad, you can use MIXED algorithm but change PREDICTION_SMOOTHING parameter to be <0.5.
Tatyana Yakushev [PredixionSoftware.com]
- Propus ca răspuns de Eileen ZhaoMicrosoft Contingent Staff, Moderator 3 iulie 2012 09:51
- Anulare propunere ca răspuns de scrampy 3 iulie 2012 15:13
-
3 iulie 2012 15:23
Hi Tatyana,
Sorry for the delay in my response. I trained my model on points 109 to 137. Then I executed:
SELECT FLATTENED [Product Key], PredictTimeSeries([Sales Volume], 12) From [Test ARIMA 12]
to predict 12 points after 137 (138 onwards). For the graph that you posted, which model did you use [Test ARIMA 12] or [Test MIXED] ? Can you post the DMX that you used? Perhaps I'm doing something wrong (did my posted DMX make sense?)
With regards to the amount of history to use, we *have* more data, but the existing process in which Forecast Pro is used, uses this starting point for training data, and I want to match apples to apples. In addition, I used the same starting point of training data for all products that we are predicting for, and as I explained in my first post:
<quote>Many other series in our experiments have produced results very comparable to (or better than) FP, but many have produced results that appear "wild".</quote>
This was one example of a "wild" prediction. What is the variable that I'm not considering between products that produce results comparable to Forecast Pro with the same amount of historical data, vs. those that appear obviously wrong?
Thanks for your continued support!
-
9 iulie 2012 22:43Membru care oferă răspunsuriI used DM Add-ins for Excel to create the model and I trained it on points 109-125. This chart is for ARIMA only. You can try to change any of the parameters to check if you will get better predictions in your case (I would start with COMPLEXITY_PENALTY and MINIMUM_SUPPORT)
Tatyana Yakushev [PredixionSoftware.com]