linear regression simple
-
שבת 09 יוני 2012 00:00
I am trying to build and run a dmx linear regression to predict some data.
data is in the format:
key_id, 1979,1980 - 2010, sales2010, sales2011
example
1,100,200, ... , 500,600 (many rows of this).
the x values are 1979 - 2010 numbers.
the y value would be sales2010
How would you write this in dmx language? (how do you designate x and y value data?)
so in designing the model, my input values are x values and the predicted value is the y value?
I am uncertain how to relate the model to the lr function in R.
foxjazz
- נערך על-ידי foxjazz2 שבת 09 יוני 2012 17:11
כל התגובות
-
יום שני 11 יוני 2012 04:44משיב
You need to mark columns 1979-2010 as input and sales 2011 as predict.
This way you can create linear regression model predicting 2011 sales based on numbers for 1979-2010.
However, I don't think this model is going to be good.
You can get more accurate model if you use time series algorithms. To use time series, you need to organize your data differently.
One way to organize data for time series is
timestamp, key1_sales, key2_sales ....
1979 100 120 .....
1980 102 115 .....
2010 253 172 ....
Tatyana Yakushev [PredixionSoftware.com]
-
יום שני 11 יוני 2012 14:23
that makes a lot of since.
However you were going on the assumption I was using sales data.
I am using vio data for :
parthumber = key
1979 car fits vio (applications * vio) for 1979 vehicles ... 2010 vehicles.
Time series wouldn't work would it?
If I just had sales data, I would agree and have 2 years of sales data. And the sales2010 is an aggrgate of 2010 sales.
How would you mix (add) 2 models to leverage both time-series with sales, and VIO data for predicting sales?
foxjazz
-
יום שני 11 יוני 2012 17:11משיבWhat is VIO?
Tatyana Yakushev [PredixionSoftware.com]
-
יום שני 11 יוני 2012 20:08
vehicles in operation.
so part 1234 fits 10 different kinds of vehicles that have 500 in operation.
foxjazz
-
יום שני 11 יוני 2012 21:19משיב
Sorry, I still don't understand your business problem.
Are you looking for additional guidance or you got all the answers already?
Tatyana Yakushev [PredixionSoftware.com]
-
יום שלישי 12 יוני 2012 21:05
well first I just need to master the linear regression model.
and I figure I would limit it by productType like (water pumps).
So when I add the model, and the key and the predicted sales (predict) I am getting a solid number accross the whole dataset, where it should deliniate by key. It had done that before, and now I don't know why it is giving me strange data.
Maybe I have too many x values?
foxjazz
-
יום שלישי 12 יוני 2012 21:42
so basically LR is suppose to build coeficients accross the key, and the predict function is suppose to use those coeficients with the y value to predict y.
y is "sales"
However the predict function doesn't appear to be doing that.
This is the prediction join sql that I have run:
SELECT t.[Sales2010], t.[Sales2011], [Vw Water Pumps].[Sales2010], [Vw Water Pumps].[Sales2011], t.[productcontrolid] From [Vw Water Pumps] PREDICTION JOIN OPENQUERY([Intellitrak], 'SELECT [Sales2010], [Sales2011], [productcontrolid], [d1974], [d1975], [d1976], [d1977], [d1978], [d1979], [d1980], [d1981], [d1982], [d1983], [d1984], [d1985], [d1986], [d1987], [d1988], [d1989], [d1990], [d1991], [d1992], [d1993], [d1994], [d1995], [d1996], [d1997], [d1998], [d1999], [d2000], [d2001], [d2002], [d2003], [d2004], [d2005], [d2006], [d2007], [d2008], [d2009], [d2010], [d2011], [d2012] FROM [dbo].[vwWaterPumps] ') AS t ON [Vw Water Pumps].[d1974] = t.[d1974] AND [Vw Water Pumps].[d1975] = t.[d1975] AND [Vw Water Pumps].[d1976] = t.[d1976] AND [Vw Water Pumps].[d1977] = t.[d1977] AND [Vw Water Pumps].[d1978] = t.[d1978] AND [Vw Water Pumps].[d1979] = t.[d1979] AND [Vw Water Pumps].[d1980] = t.[d1980] AND [Vw Water Pumps].[d1981] = t.[d1981] AND [Vw Water Pumps].[d1982] = t.[d1982] AND [Vw Water Pumps].[d1983] = t.[d1983] AND [Vw Water Pumps].[d1984] = t.[d1984] AND [Vw Water Pumps].[d1985] = t.[d1985] AND [Vw Water Pumps].[d1986] = t.[d1986] AND [Vw Water Pumps].[d1987] = t.[d1987] AND [Vw Water Pumps].[d1988] = t.[d1988] AND [Vw Water Pumps].[d1989] = t.[d1989] AND [Vw Water Pumps].[d1990] = t.[d1990] AND [Vw Water Pumps].[d1991] = t.[d1991] AND [Vw Water Pumps].[d1992] = t.[d1992] AND [Vw Water Pumps].[d1993] = t.[d1993] AND [Vw Water Pumps].[d1994] = t.[d1994] AND [Vw Water Pumps].[d1995] = t.[d1995] AND [Vw Water Pumps].[d1996] = t.[d1996] AND [Vw Water Pumps].[d1997] = t.[d1997] AND [Vw Water Pumps].[d1998] = t.[d1998] AND [Vw Water Pumps].[d1999] = t.[d1999] AND [Vw Water Pumps].[d2000] = t.[d2000] AND [Vw Water Pumps].[d2001] = t.[d2001] AND [Vw Water Pumps].[d2002] = t.[d2002] AND [Vw Water Pumps].[d2003] = t.[d2003] AND [Vw Water Pumps].[d2004] = t.[d2004] AND [Vw Water Pumps].[d2005] = t.[d2005] AND [Vw Water Pumps].[d2006] = t.[d2006] AND [Vw Water Pumps].[d2007] = t.[d2007] AND [Vw Water Pumps].[d2008] = t.[d2008] AND [Vw Water Pumps].[d2009] = t.[d2009] AND [Vw Water Pumps].[d2010] = t.[d2010] AND [Vw Water Pumps].[d2011] = t.[d2011] AND [Vw Water Pumps].[d2012] = t.[d2012] AND [Vw Water Pumps].[Productcontrolid] = t.[productcontrolid]
The result sample looks like this:
Sales2010 Sales2011 Sales2010 Sales2011 productcontrolid
237 445 1804 1904 25041
237 445 1804 1904 25041
133 388 1804 1904 20790
133 388 1804 1904 20790
5531 4975 1804 1904 21128
5531 4975 1804 1904 21128
1887 2144 1804 1904 21068
1887 2144 1804 1904 21068
273 163 1804 1904 20707
273 163 1804 1904 20707
10 3 1804 1904 20962
-
יום רביעי 13 יוני 2012 23:59משיב
Why don't you open model viewer and check the regression formula that was created? (You can see it in the tree viewer when you click on the node). Does it look correct to you?
Tatyana Yakushev [PredixionSoftware.com]
- סומן כתשובה על-ידי foxjazz2 יום שני 25 יוני 2012 19:52