linear regression simple

שאלה

• I am trying to build and run a dmx linear regression to predict some data.

data is in the format:

key_id, 1979,1980 - 2010, sales2010, sales2011

example

1,100,200, ...   , 500,600  (many rows of this).

the x values are 1979 - 2010 numbers.

the y value would be sales2010

How would you write this in dmx language?  (how do you designate x and y value data?)

so in designing the model, my input values are x values and the predicted value is the y value?

I am uncertain how to relate the model to the lr function in R.

foxjazz

• נערך על-ידי שבת 09 יוני 2012 17:11
שבת 09 יוני 2012 00:00

תשובות

• Why don't you open model viewer and check the regression formula that was created? (You can see it in the tree viewer when you click on the node). Does it look correct to you?

Tatyana Yakushev [PredixionSoftware.com]

• סומן כתשובה על-ידי יום שני 25 יוני 2012 19:52
יום רביעי 13 יוני 2012 23:59

כל התגובות

• You need to mark columns 1979-2010 as input and sales 2011 as predict.

This way you can create linear regression model predicting 2011 sales based on numbers for 1979-2010.

However, I don't think this model is going to be good.

You can get more accurate model if you use time series algorithms. To use time series, you need to organize your data differently.

One way to organize data for time series is

timestamp, key1_sales, key2_sales ....

1979      100        120    .....

1980      102        115    .....

2010     253         172    ....

Tatyana Yakushev [PredixionSoftware.com]

יום שני 11 יוני 2012 04:44
• that makes a lot of since.

However you were going on the assumption I was using sales data.

I am using vio data for :

parthumber = key

1979 car fits vio (applications * vio) for 1979 vehicles  ... 2010 vehicles.

Time series wouldn't work would it?

If I just had sales data, I would agree and have 2 years of sales data. And the sales2010 is an aggrgate of 2010 sales.

How would you mix (add) 2 models to leverage both time-series with sales, and VIO data for predicting sales?

foxjazz

יום שני 11 יוני 2012 14:23
• What is VIO?

Tatyana Yakushev [PredixionSoftware.com]

יום שני 11 יוני 2012 17:11
• vehicles in operation.

so part 1234 fits 10 different kinds of vehicles that have 500 in operation.

foxjazz

יום שני 11 יוני 2012 20:08

Tatyana Yakushev [PredixionSoftware.com]

יום שני 11 יוני 2012 21:19
• well first I just need to master the linear regression model.

and I figure I would limit it by productType like (water pumps).

So when I add the model, and the key and the predicted sales (predict) I am getting a solid number accross the whole dataset, where it should deliniate by key. It had done that before, and now I don't know why it is giving me strange data.

Maybe I have too many x values?

foxjazz

יום שלישי 12 יוני 2012 21:05
• so basically LR is suppose to build coeficients accross the key, and the predict function is suppose to use those coeficients with the y value to predict y.

y is "sales"

However the predict function doesn't appear to be doing that.

This is the prediction join sql that I have run:

```SELECT
t.[Sales2010],
t.[Sales2011],
[Vw Water Pumps].[Sales2010],
[Vw Water Pumps].[Sales2011],
t.[productcontrolid]
From
[Vw Water Pumps]
PREDICTION JOIN
OPENQUERY([Intellitrak],
'SELECT
[Sales2010],
[Sales2011],
[productcontrolid],
[d1974],
[d1975],
[d1976],
[d1977],
[d1978],
[d1979],
[d1980],
[d1981],
[d1982],
[d1983],
[d1984],
[d1985],
[d1986],
[d1987],
[d1988],
[d1989],
[d1990],
[d1991],
[d1992],
[d1993],
[d1994],
[d1995],
[d1996],
[d1997],
[d1998],
[d1999],
[d2000],
[d2001],
[d2002],
[d2003],
[d2004],
[d2005],
[d2006],
[d2007],
[d2008],
[d2009],
[d2010],
[d2011],
[d2012]
FROM
[dbo].[vwWaterPumps]
') AS t
ON
[Vw Water Pumps].[d1974] = t.[d1974] AND
[Vw Water Pumps].[d1975] = t.[d1975] AND
[Vw Water Pumps].[d1976] = t.[d1976] AND
[Vw Water Pumps].[d1977] = t.[d1977] AND
[Vw Water Pumps].[d1978] = t.[d1978] AND
[Vw Water Pumps].[d1979] = t.[d1979] AND
[Vw Water Pumps].[d1980] = t.[d1980] AND
[Vw Water Pumps].[d1981] = t.[d1981] AND
[Vw Water Pumps].[d1982] = t.[d1982] AND
[Vw Water Pumps].[d1983] = t.[d1983] AND
[Vw Water Pumps].[d1984] = t.[d1984] AND
[Vw Water Pumps].[d1985] = t.[d1985] AND
[Vw Water Pumps].[d1986] = t.[d1986] AND
[Vw Water Pumps].[d1987] = t.[d1987] AND
[Vw Water Pumps].[d1988] = t.[d1988] AND
[Vw Water Pumps].[d1989] = t.[d1989] AND
[Vw Water Pumps].[d1990] = t.[d1990] AND
[Vw Water Pumps].[d1991] = t.[d1991] AND
[Vw Water Pumps].[d1992] = t.[d1992] AND
[Vw Water Pumps].[d1993] = t.[d1993] AND
[Vw Water Pumps].[d1994] = t.[d1994] AND
[Vw Water Pumps].[d1995] = t.[d1995] AND
[Vw Water Pumps].[d1996] = t.[d1996] AND
[Vw Water Pumps].[d1997] = t.[d1997] AND
[Vw Water Pumps].[d1998] = t.[d1998] AND
[Vw Water Pumps].[d1999] = t.[d1999] AND
[Vw Water Pumps].[d2000] = t.[d2000] AND
[Vw Water Pumps].[d2001] = t.[d2001] AND
[Vw Water Pumps].[d2002] = t.[d2002] AND
[Vw Water Pumps].[d2003] = t.[d2003] AND
[Vw Water Pumps].[d2004] = t.[d2004] AND
[Vw Water Pumps].[d2005] = t.[d2005] AND
[Vw Water Pumps].[d2006] = t.[d2006] AND
[Vw Water Pumps].[d2007] = t.[d2007] AND
[Vw Water Pumps].[d2008] = t.[d2008] AND
[Vw Water Pumps].[d2009] = t.[d2009] AND
[Vw Water Pumps].[d2010] = t.[d2010] AND
[Vw Water Pumps].[d2011] = t.[d2011] AND
[Vw Water Pumps].[d2012] = t.[d2012] AND
[Vw Water Pumps].[Productcontrolid] = t.[productcontrolid]```

The result sample looks like this:

Sales2010 Sales2011 Sales2010 Sales2011 productcontrolid
237 445 1804 1904 25041
237 445 1804 1904 25041
133 388 1804 1904 20790
133 388 1804 1904 20790
5531 4975 1804 1904 21128
5531 4975 1804 1904 21128
1887 2144 1804 1904 21068
1887 2144 1804 1904 21068
273 163 1804 1904 20707
273 163 1804 1904 20707
10 3 1804 1904 20962

• נערך על-ידי יום שלישי 12 יוני 2012 22:28
יום שלישי 12 יוני 2012 21:42
• Why don't you open model viewer and check the regression formula that was created? (You can see it in the tree viewer when you click on the node). Does it look correct to you?

Tatyana Yakushev [PredixionSoftware.com]

• סומן כתשובה על-ידי יום שני 25 יוני 2012 19:52
יום רביעי 13 יוני 2012 23:59