SQL Server Developer Center > SQL Server Forums > Data Mining > How to choose the right model for deployment ?
Ask a questionAsk a question
 

AnswerHow to choose the right model for deployment ?

  • Sunday, November 01, 2009 11:03 AMdesserts Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Model A = considerably more accurate than model B on training data
    Model B = more accurate than model A on validation data

    So which one would you consider for final deployment ?
    why?

    i was thinking if Model B is better.

Answers

  • Tuesday, November 03, 2009 4:49 PMrok1 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer


     I would check couple of things,



    1. Try training with different parameters (e.g. dissable the feature selection or try with different values (input/output/States attributes) and see if it improves your accuracy)

    Any recommendation how can i fine tune ?
    1. Validating your training data(fine tune) : collect the cases where model B  couldn't predict the training data. Depending upon requirements, try to consolidate it with other similar cases or even try to train without those cases and see if it improves your prediction)
    2. Try with different algorithm to see if that other  algorithm can predict those cases where your Model B couldn't and try to implement water-fall approach where you use different algorithm and different models to improve your results.

    it all depends on your requirement and how you can collectively use the power of Analysis services data mining and your own analysis to get the ball rolling. I have cases where I've trained my models hundreds of times on different machines and spent months tweaking the training data. When I couldn't see any improvement on certain classes.



    hth,

    Rok
    • Marked As Answer byJin ChenMSFT, ModeratorWednesday, November 11, 2009 9:52 AM
    • Proposed As Answer byrok1 Tuesday, November 03, 2009 6:40 PM
    •  

All Replies

  • Monday, November 02, 2009 3:28 PMrok1 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    You cannot rely on Model A.
    Model B, Have you tried training it multiple times with different parametrization and on different algorithms? Preparation of the Training data is the most time consuming in any DM projects, so you can collect all the cases where Model B didn't predict the training data itself correctly and try to see if you can tune it. Most of the times 10 percent of sample  training data should represent the universe. However, sometimes the distribution of classes in your Training table will influence the prediction too.




    hth,


    Rok
  • Tuesday, November 03, 2009 3:29 PMdesserts Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    You cannot rely on Model A.
    Model B, Have you tried training it multiple times with different parametrization and on different algorithms? Preparation of the Training data is the most time consuming in any DM projects, so you can collect all the cases where Model B didn't predict the training data itself correctly and try to see if you can tune it. Most of the times 10 percent of sample  training data should represent the universe. However, sometimes the distribution of classes in your Training table will influence the prediction too.




    hth,


    Rok
    So are you saying that I should check on Model B first and make sure that the training data is fine-tuned before I can use it ?
    Any recommendation how can i fine tune ?
    or have you actually experience this before , care to share ?
    thks
  • Tuesday, November 03, 2009 4:49 PMrok1 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer


     I would check couple of things,



    1. Try training with different parameters (e.g. dissable the feature selection or try with different values (input/output/States attributes) and see if it improves your accuracy)

    Any recommendation how can i fine tune ?
    1. Validating your training data(fine tune) : collect the cases where model B  couldn't predict the training data. Depending upon requirements, try to consolidate it with other similar cases or even try to train without those cases and see if it improves your prediction)
    2. Try with different algorithm to see if that other  algorithm can predict those cases where your Model B couldn't and try to implement water-fall approach where you use different algorithm and different models to improve your results.

    it all depends on your requirement and how you can collectively use the power of Analysis services data mining and your own analysis to get the ball rolling. I have cases where I've trained my models hundreds of times on different machines and spent months tweaking the training data. When I couldn't see any improvement on certain classes.



    hth,

    Rok
    • Marked As Answer byJin ChenMSFT, ModeratorWednesday, November 11, 2009 9:52 AM
    • Proposed As Answer byrok1 Tuesday, November 03, 2009 6:40 PM
    •  
  • Monday, November 09, 2009 3:45 PMdesserts Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thanks Rok.

    I will see what i can do.