locked
Accessing input data in custom data mining plug-in RRS feed

  • Question

  • We are trying to create a plug-in with a new data mining algorithm following the steps outlined in documentation on MSDN Web site (using the DMPluginWrapper).

    Our algorithm works on string values and as such requires actual values for attributes (columns) not only during training, but also during prediction. We can get the values during training using the function UntokenizeAttributeValue, but not during prediction, if the string was not encountered during training.

    This has already been pointed out by Bogdan Crivat in one of the older posts in microsoft.public.sqlserver.datamining:

    CLR plug in algorithm
    http://groups.google.com/group/microsoft.public.sqlserver.datamining/msg/325fc80ce6a968e4?hl=en

    Is there any other way to get the input string which was used for prediction query? Without this value the prediction in our algorithm can't be done.


    Matjaz Rihtar, Josef Stefan Institute, http://kt.ijs.si/
    Thursday, July 15, 2010 9:31 AM

Answers

  • If you are using the managed plug-ins API, you could try to use a custom mining function which takes a string as parameter, then invoke your prediction like below:

    Select MyCustomFunction(T.StringColumn) FROM MyModel NATURAL PREDICTION JOIN

    (SELECT 'My Long String Here' AS StringColumn, 23 AS A, 44 AS B) AS T

    The function signature should look like:

    [MiningFunction("MyCustomFunction")]
    public object MyCustomFunction(object stringParameter, MiningCase matchedInputCase)
    {
    ...
    }

    your stringParameter should contain the unmapped StrngColumn field while the rest of the columns will be mapped inside MiningCase. It could work as long as you have a single large text parameter (and you know how to use it inside your algorithm, i.e. one of your model columns has some kind of flag).

    Hope this helps

    bogdan

     PS. Note that using the T.ColumnName in the SELECT list is a feature supported by SQL Server 2008 or newer (but not in SQL Server 2005)

     


    bogdan crivat / http://www.bogdancrivat.net/dm
    • Marked as answer by Raymond-Lee Sunday, July 25, 2010 1:09 PM
    Friday, July 16, 2010 5:06 AM