none
Machine Learning Model Inaccurate RRS feed

  • Question

  • Hi,

         I am using Azure ML studio to train bidding price, the columns are simple and similar, the bid price is around $.2 to $2.0 , i used regression, but the result/prediction is inaccurate - the coefficient of determination/R squared is near 0

    Please help.

    Friday, August 7, 2015 10:40 PM

Answers

All replies

  • Hi,

    It sounds like you are trying to predict a continuous bid price ranging from $0.2 to $2.0 using Regression. Did you try the available in-built modules in Azure ML?

    Also, if your model does not improve, are there any additional variables you could use to help predict the dependent variable?

    If you are new to Azure ML, the gallery provides a range of samples you could refer to: https://gallery.azureml.net/browse/?categories=["Experiment"]&examples=true

    Regards,
    Jaya.

    Sunday, August 9, 2015 8:30 PM
  • Wich model are you using and how are you trying to optimize it? Be carefull with Linear Models when your data don't have a linear relation because it won't work. As Jaya suggested, try other built in models. Decision Forest Regressor is kinda state-of-the-art for regression tasks.
    Monday, August 10, 2015 1:54 PM
  • Yes, I did, the variables I had are:
    Time, Auction ID, Ad ID, Ad Spot, Token
    Which one can help improve the prediction? I only used one column price for prediction, it gave me labeledscore.
    The model I built had data->scrubber->split (0.7%)
    I train it with 4 models, almost similar result.
    I scored and evaluated the model.

    Na

    Monday, August 10, 2015 4:36 PM
  • I tried:

    bayesian linear regression
    linear regression
    neural network regression

    Poission regression

    The pricing: 0.338807656, 0.350054826, 0.43, 0.45, 0.5, 0.416236925, 0.4, 0.5, 0.409648277, 0.46, 0.415129924, 0.420575754, 0.48, 0.413566776, ,0.432664613, 0.354607979, 0.42...

    What other model I can try?


    Na

    Monday, August 10, 2015 4:44 PM
  • Hi,

    Since all the models seem to get you similar results, here are some suggestions:

    1) You might want to check if there are any additional variables that can be added to your model.

    2) Are there some variables you can derive from your existing dataset? Example - from the time stamp variable, does it make sense to add time of the day as a variable? 

    3) Are there too many categories in these variables: Time, Auction ID, Ad ID, Ad Spot, Token. Does grouping some of them make sense in your use case?

    Regards,
    Jaya.  

    Monday, August 10, 2015 6:10 PM
  • Yaya,

    I guess the time might be affecting the price, but not significantly, if it does, will it help the labeledscore and improve the coefficient value?

    Where and how to add a new variable or grouping? in Split or the input data?

    Thanks!


    Na

    Monday, August 10, 2015 11:45 PM
  • Hi,

    There are many ways of adding new variables, it depends on your preference. I generally do a lot of data manipulation by writing code in either the 'Execute R Script' or 'Execute Python Script' modules.

    For some simpler manipulation, you could try and use inbuilt modules like 'Apply Math Operation'.

    Regards,
    Jaya.

    Tuesday, August 11, 2015 2:25 PM
  • Jaya,

      I took out the ids column, leave price and bid time, it improved, but what variable I can add to make it better? Where and how to add it for example ad value, width, height or floor/max price?

    Thanks!

      


    Na

    Wednesday, August 12, 2015 12:40 AM
  • Hi,

    Some examples in the Gallery might be a good reference: https://gallery.azureml.net/

    As stated earlier,

    There are many ways of adding new variables, it depends on your preference. I generally do a lot of data manipulation by writing code in either the 'Execute R Script' or 'Execute Python Script' modules. For some simpler manipulation, you could try and use inbuilt modules like 'Apply Math Operation'.

    https://msdn.microsoft.com/library/azure/6bd12c13-d9c3-4522-94d3-4aa44513af57

    Regards,
    Jaya.



    Wednesday, August 12, 2015 2:05 PM
  • The link to the examples is not popping up, is there any other examples that can do the same/refine regression model, add variables and 

    Why the split .5% affect the coefficient value? In my model the more percentage of training data, the worse of the accuracy of coefficient of determination.


    Na

    Wednesday, August 12, 2015 10:20 PM
  • Hi,

    I am not sure why the Gallery page does not work, here is another way to view some sample experiments.

    On the top panel: select the 'Gallery', here you can view sample experiments.

    Also, you can view samples by selecting the option "SAMPLES' as shown below.

    Depending on the nature of your dataset, some derived variables might improve your model.

    Regards,
    Jaya.

    Wednesday, August 12, 2015 11:06 PM
  • Jaya,

         Which sample or in the gallery is using regression with derived variable?

         Also, Why when split .5% makes the coefficient value better, is less test portion good or bad?


    Na

    Friday, August 21, 2015 12:49 AM
  • Hi,

    I would recommend reviewing the samples published in the 'Samples' in detail. Some of these samples will give you an idea on how to use the modules for creating derived variables.  

    I do not expect to see a sample that exactly matches your use case. But it will give you an understanding of how to use the various modules.

    Typically the guidance on splitting the data between training/scoring is 60:40, 70:30, 80:20, but users sometimes choose the split optimal to their use case.   

    Regards,
    Jaya.



    Friday, August 21, 2015 1:12 PM
  • Hi, Jaya,

         Will split 50/50 affect the performance of the model? How does it relates to the published web service to predict real case scenario?

         I don't see any one added variable to make it predict better using regression.

         Is there explanation/document with a good example of it?

    Thanks!


    Na

    Tuesday, September 1, 2015 11:52 PM
  • Hi,

    If your data is randomly spilt into 50:50, it should not affect your model performance.  

    Here is some documentation on web services: https://azure.microsoft.com/en-us/documentation/articles/machine-learning-publish-a-machine-learning-web-service/

    Here is some documentation on how to build a model: https://azure.microsoft.com/en-us/documentation/articles/machine-learning-interpret-model-results/

    There is no example with your exact use case.

    Regards,
    Jaya.

    Wednesday, September 2, 2015 1:39 PM