Monday, July 30, 2012 9:17 AMHello all,
I'm working through the SSAS 2012 Data Mining Basic tutorial and would like to know why the HoldOutSeed property in the Mining Structure is set to 12 in the tutorial and what this value actually signifies:
If I complete the tutorial and then change the seed value (to something like 24) and re-process, I get different strong influencing factors in the Decision Tree results (e.g region swings from Europe to North America!)
MSDN isn't really helping me understand what this value signifies.
If anyone can help me understand that would be much appreciated.
Monday, July 30, 2012 9:44 AM
Generally, in most of the genrators of pseudo random numbers you need to set a seed - some starting point of the generator. So, if you start generating numbers from the same initial seed, you will get the same numbers (it is easy to check, for example "SELECT RAND(12) as first, RAND() as second, RAND() as third" at each run you will see the same random numbers - that is why they called pseudo random).
In the example, the parameter HoldOutSeed is 12 to get the same result (to compare your results with their and check if you have done everything correctly - as they teach you) at each time - the whole set of data is divided onto training data set and test dat set always in the same manner.
If you have changed the parameter HoldOutSeed to 24, you have trained your model on different learning data set. This is why the cross validation method is useful - to check if your results are not depend on division your data onto learning and test data sets.
Tuesday, July 31, 2012 1:34 PM
Thanks for your reply koles.
So I should process models on different HoldOutSeeds and use cross-validation methods to validate each model?
Unmarked as answered for the next few hours just incase anyone has anything further to add, then I'll mark as answered.