Note: Forums will be making significant UX changes to address key usability improvements surrounding search, discoverability and navigation. To learn more about these changes please visit the announcement which can be found HERE.
Infinite loop? on insert for naive bayes classifier

Traitée Infinite loop? on insert for naive bayes classifier

  • Monday, March 05, 2012 5:35 PM
     
     

    Hello,

    I'm writing a C# program that is supposed to insert data into a database and then create a mining structure/model and train it. What I'm getting is that when I do the INSERT INTO, the analysis service starts churning and churning and memory starts to grow until it eventually the page file runs low and it crashes (after an hour). If I do it in Sql Server Business Intelligence Studio, the training takes 30 seconds and everything works fine.

    Here is some of my SQL to create everything.

    "CREATE MINING STRUCTURE [MiningModel] (
    [id]    LONG    KEY,
    [code] TEXT DISCRETE,
    [itemdescriptionnt1] TABLE
    ([id] LONG KEY,
    [itemdescriptionnc1] TEXT DISCRETE))"

    "ALTER MINING STRUCTURE [MiningModel]
    ADD MINING MODEL [MiningModel] (
    [id],
    [code] PREDICT ,
    [itemdescriptionnt1]
    ([id],[itemdescriptionnc1]))
    USING Microsoft_Naive_Bayes(MAXIMUM_INPUT_ATTRIBUTES=0,MAXIMUM_OUTPUT_ATTRIBUTES=0,MAXIMUM_STATES=0)"

    "INSERT INTO MINING STRUCTURE [MiningModel]
    ([id],
    [code],
    [itemdescriptionnt1]([id],[itemdescriptionnc1]))
    SHAPE {OPENQUERY(MiningModelInput,'select id, code from Transactions')}
    APPEND ( { OPENQUERY(MiningModelInput,'SELECT id, itemdescriptionnc1 FROM NestedTransactions') }
    RELATE [id] to [id]) AS itemdescriptionnt1"

    Can anyone see anything wrong with this? Especially with the INSERT INTO statement?

    Basically what this does is that it reads in transaction data using 2 tables. The case table is "Transactions" and the nested table is "NestedTransactions". Transactions are assigned a code (that I want to predict). Items are put into the nested table.

    Any help is appreciated. Thank you.

    mj



    • Edited by mjaskiewicz Monday, March 05, 2012 7:50 PM
    •  

All Replies

  • Monday, March 05, 2012 10:02 PM
    Answerer
     
     
    Are you training model in BI Dev Studio and in your code on the same data? You can run Profiler when you train the model in BI Dev Studio and look at the query being sent (It might be XMLA instead of DMX). I should be able to run the same query that BI Dev Studio runs to train the model.

    Tatyana Yakushev [PredixionSoftware.com]

  • Tuesday, March 06, 2012 1:49 AM
     
     

    Tatyana,

    Thank you for the reply. I compared a trace from Profiler and I see that BI doesn't issue a DMX query and so I can't compare mine against it. When it is running though, I can see that both read from the (same) data store and start training the classifier. The BI version performs some SQL queries that the DMX version doesn't and it looks like my version gets really close to the end but then just gets stuck on some operation that is not included in the trace. The rest of the log just fills up with those flight recorder snapshots.

    Great suggestion though. Got any others?

    mj

  • Tuesday, March 06, 2012 6:25 PM
    Answerer
     
     Answered

    BIDS sends XMLA queries (they look like XML). Those queries go though a different code path than DMX queries.

    DMX queries create a copy of the data the model will be trained on and that is why you are running out of memory.

    If adding more memory to the computer is not an option, you need to use XMLA queries instead of DMX. (Create data source, data source view and use it when training the model. Do exactly what BIDS is doing).


    Tatyana Yakushev [PredixionSoftware.com]

    • Marked As Answer by mjaskiewicz Tuesday, March 06, 2012 6:32 PM
    •  
  • Tuesday, March 06, 2012 6:32 PM
     
     

    Ah. Okay. I'll follow that route instead. Thanks.

    mj

  • Tuesday, March 06, 2012 9:35 PM
     
     Answered

    Actually, I figured it out. I changed the DMX slightly and it started working.

    I changed the reference to ID in the nested table to SKIP.

    "INSERT INTO MINING STRUCTURE [MiningModel]
    ([id],
    [code],
    [itemdescriptionnt1](
    SKIP,
    [itemdescriptionnc1]))
    SHAPE {OPENQUERY(MiningModelInput,'select id, code from Transactions')}
    APPEND ( { OPENQUERY(MiningModelInput,'SELECT id, itemdescriptionnc1 FROM NestedTransactions') }
    RELATE [id] to [id]) AS itemdescriptionnt1"

    • Marked As Answer by mjaskiewicz Tuesday, March 06, 2012 9:35 PM
    •