none
How to Determine Discretization Method used?

    Question

  • Hi

    Is there a way to query the Mining Model to determine the Discretization Method used when a model is processed, if you have left the DiscretizationMethod to Automatic?

    I can infer it if I use the RangeMin & RangeMax functions to work out the range and number of buckets used, then reverse engineer that back to my data to work out the populations of each bucket. If each bucket is equally populated then it's used EQUAL_AREAS otherwise it's gone for CLUSTERED.

    But... it would be nice if I could just collect the decision directly from the model. Anybody any the wiser?

    Cheers, James


    James Beresford @ www.bimonkey.com & @BI_Monkey
    SSIS / MSBI Consultant in Sydney, Australia
    SSIS ETL Execution Control and Management Framework @ SSIS ETL Framework on Codeplex

    Thursday, September 13, 2012 11:15 PM

All replies

  • Hi BI Monkey,

    When you left the DiscretizationMethod to Automatic, Analysis Services determines which discretization method to use. If your data mining solution uses data from an Online Analytical Processing (OLAP) cube, the data mining algorithm automatically computes the number of buckets to generate by using the following equation, where n is the number of distinct values of data in the column: Number of Buckets = sqrt(n)

    For more information about it, please see:
    Discretization Methods (Data Mining): http://msdn.microsoft.com/en-us/library/ms174512(v=sql.105).aspx

    Thanks,
    Eileen


    Please remember to mark the replies as answers if they help and unmark them if they provide no help. This can be beneficial to other community members reading the thread.

    Tuesday, September 25, 2012 6:23 AM
  • What I want to know is after the model has run, which method was chosen. The above link just tells me what option it could choose.


    James Beresford @ www.bimonkey.com & @BI_Monkey
    SSIS / MSBI Consultant in Sydney, Australia
    SSIS ETL Execution Control and Management Framework @ SSIS ETL Framework on Codeplex

    Friday, September 28, 2012 4:11 AM
  • I don't think you can get the discretization method per se. The information isn't even output in the model parameters so you have to look at the model content on a per-attribute basis.

    You can browse the bucketed values by using the Generic Content Viewer, or you can query the model content and use the NODE_DISTRIBUTION values. Then get the rows with VALUETYPE of 5.

    SELECT FLATTENED NODE_DISTRIBUTION as t
    from [MyBinnedModel].CONTENT

    The information on valuetypes is kind of buried here, about 2/3rds down the page.

    Mining Model content

    Let me know if these aren't the values you want.


    SQL Server UE, Data Mining

    Tuesday, October 02, 2012 5:04 PM
    Answerer