none
Infinite loop? on insert for naive bayes classifier

    Вопрос

  • Hello,

    I'm writing a C# program that is supposed to insert data into a database and then create a mining structure/model and train it. What I'm getting is that when I do the INSERT INTO, the analysis service starts churning and churning and memory starts to grow until it eventually the page file runs low and it crashes (after an hour). If I do it in Sql Server Business Intelligence Studio, the training takes 30 seconds and everything works fine.

    Here is some of my SQL to create everything.

    "CREATE MINING STRUCTURE [MiningModel] (
    [id]    LONG    KEY,
    [code] TEXT DISCRETE,
    [itemdescriptionnt1] TABLE
    ([id] LONG KEY,
    [itemdescriptionnc1] TEXT DISCRETE))"

    "ALTER MINING STRUCTURE [MiningModel]
    ADD MINING MODEL [MiningModel] (
    [id],
    [code] PREDICT ,
    [itemdescriptionnt1]
    ([id],[itemdescriptionnc1]))
    USING Microsoft_Naive_Bayes(MAXIMUM_INPUT_ATTRIBUTES=0,MAXIMUM_OUTPUT_ATTRIBUTES=0,MAXIMUM_STATES=0)"

    "INSERT INTO MINING STRUCTURE [MiningModel]
    ([id],
    [code],
    [itemdescriptionnt1]([id],[itemdescriptionnc1]))
    SHAPE {OPENQUERY(MiningModelInput,'select id, code from Transactions')}
    APPEND ( { OPENQUERY(MiningModelInput,'SELECT id, itemdescriptionnc1 FROM NestedTransactions') }
    RELATE [id] to [id]) AS itemdescriptionnt1"

    Can anyone see anything wrong with this? Especially with the INSERT INTO statement?

    Basically what this does is that it reads in transaction data using 2 tables. The case table is "Transactions" and the nested table is "NestedTransactions". Transactions are assigned a code (that I want to predict). Items are put into the nested table.

    Any help is appreciated. Thank you.

    mj



    • Изменено mjaskiewicz 5 марта 2012 г. 19:50
    5 марта 2012 г. 17:35

Ответы

  • BIDS sends XMLA queries (they look like XML). Those queries go though a different code path than DMX queries.

    DMX queries create a copy of the data the model will be trained on and that is why you are running out of memory.

    If adding more memory to the computer is not an option, you need to use XMLA queries instead of DMX. (Create data source, data source view and use it when training the model. Do exactly what BIDS is doing).


    Tatyana Yakushev [PredixionSoftware.com]

    • Помечено в качестве ответа mjaskiewicz 6 марта 2012 г. 18:32
    6 марта 2012 г. 18:25
  • Actually, I figured it out. I changed the DMX slightly and it started working.

    I changed the reference to ID in the nested table to SKIP.

    "INSERT INTO MINING STRUCTURE [MiningModel]
    ([id],
    [code],
    [itemdescriptionnt1](
    SKIP,
    [itemdescriptionnc1]))
    SHAPE {OPENQUERY(MiningModelInput,'select id, code from Transactions')}
    APPEND ( { OPENQUERY(MiningModelInput,'SELECT id, itemdescriptionnc1 FROM NestedTransactions') }
    RELATE [id] to [id]) AS itemdescriptionnt1"

    • Помечено в качестве ответа mjaskiewicz 6 марта 2012 г. 21:35
    6 марта 2012 г. 21:35

Все ответы

  • Are you training model in BI Dev Studio and in your code on the same data? You can run Profiler when you train the model in BI Dev Studio and look at the query being sent (It might be XMLA instead of DMX). I should be able to run the same query that BI Dev Studio runs to train the model.

    Tatyana Yakushev [PredixionSoftware.com]

    5 марта 2012 г. 22:02
  • Tatyana,

    Thank you for the reply. I compared a trace from Profiler and I see that BI doesn't issue a DMX query and so I can't compare mine against it. When it is running though, I can see that both read from the (same) data store and start training the classifier. The BI version performs some SQL queries that the DMX version doesn't and it looks like my version gets really close to the end but then just gets stuck on some operation that is not included in the trace. The rest of the log just fills up with those flight recorder snapshots.

    Great suggestion though. Got any others?

    mj

    6 марта 2012 г. 1:49
  • BIDS sends XMLA queries (they look like XML). Those queries go though a different code path than DMX queries.

    DMX queries create a copy of the data the model will be trained on and that is why you are running out of memory.

    If adding more memory to the computer is not an option, you need to use XMLA queries instead of DMX. (Create data source, data source view and use it when training the model. Do exactly what BIDS is doing).


    Tatyana Yakushev [PredixionSoftware.com]

    • Помечено в качестве ответа mjaskiewicz 6 марта 2012 г. 18:32
    6 марта 2012 г. 18:25
  • Ah. Okay. I'll follow that route instead. Thanks.

    mj

    6 марта 2012 г. 18:32
  • Actually, I figured it out. I changed the DMX slightly and it started working.

    I changed the reference to ID in the nested table to SKIP.

    "INSERT INTO MINING STRUCTURE [MiningModel]
    ([id],
    [code],
    [itemdescriptionnt1](
    SKIP,
    [itemdescriptionnc1]))
    SHAPE {OPENQUERY(MiningModelInput,'select id, code from Transactions')}
    APPEND ( { OPENQUERY(MiningModelInput,'SELECT id, itemdescriptionnc1 FROM NestedTransactions') }
    RELATE [id] to [id]) AS itemdescriptionnt1"

    • Помечено в качестве ответа mjaskiewicz 6 марта 2012 г. 21:35
    6 марта 2012 г. 21:35