Analysis Services Data Mining HoldOutSeed property value
-
Monday, July 30, 2012 9:17 AMHello all,
I'm working through the SSAS 2012 Data Mining Basic tutorial and would like to know why the HoldOutSeed property in the Mining Structure is set to 12 in the tutorial and what this value actually signifies:
http://msdn.microsoft.com/en-us/library/cc879282.aspx
If I complete the tutorial and then change the seed value (to something like 24) and re-process, I get different strong influencing factors in the Decision Tree results (e.g region swings from Europe to North America!)
MSDN isn't really helping me understand what this value signifies.
http://technet.microsoft.com/en-us/library/microsoft.analysisservices.miningstructure.holdoutseed.aspx
If anyone can help me understand that would be much appreciated.
Many thanks
All Replies
-
Monday, July 30, 2012 9:44 AM
Hi,
Generally, in most of the genrators of pseudo random numbers you need to set a seed - some starting point of the generator. So, if you start generating numbers from the same initial seed, you will get the same numbers (it is easy to check, for example "SELECT RAND(12) as first, RAND() as second, RAND() as third" at each run you will see the same random numbers - that is why they called pseudo random).
In the example, the parameter HoldOutSeed is 12 to get the same result (to compare your results with their and check if you have done everything correctly - as they teach you) at each time - the whole set of data is divided onto training data set and test dat set always in the same manner.
If you have changed the parameter HoldOutSeed to 24, you have trained your model on different learning data set. This is why the cross validation method is useful - to check if your results are not depend on division your data onto learning and test data sets.
Regards,
gc
- Marked As Answer by Tatyana YakushevEditor Monday, July 30, 2012 5:08 PM
- Unmarked As Answer by Andy_DC Tuesday, July 31, 2012 1:31 PM
- Marked As Answer by Andy_DC Thursday, August 02, 2012 12:38 PM
-
Tuesday, July 31, 2012 1:34 PM
Thanks for your reply koles.
So I should process models on different HoldOutSeeds and use cross-validation methods to validate each model?
http://technet.microsoft.com/en-us/library/bb895174.aspx
Unmarked as answered for the next few hours just incase anyone has anything further to add, then I'll mark as answered.

