none
Error 0017 - Cannot process column "Id" of type System.String. The type is not supported by the module RRS feed

  • Question

  • Hi,

    I have the above error message in the Extract N-Gram Features from Text module in the Predictive Experiment.  My model is a simple two class classification problem.  The source data is a two column of text.

    I tried many ways to solve with no success.  I also looked into the Error 0017 help page. Would appreciate if anyone can offer advice.

    Thanks.

    
    • Edited by C. Mak Thursday, August 29, 2019 4:14 PM
    Wednesday, August 28, 2019 11:45 PM

Answers

  • Hi C. Mak,

    Thanks for reaching out. Based on the error message, your id column is not the correct type, it should be int instead of string.

    Please note that if you are creating your own custom vocabulary without using the module, you will need to supply the four columns (Id, Ngram, DF, IDF) with the correct types (int, string, int, double).

    The module expects the data in following format, for example. The last line should correspond to the total number of documents the custom vocabulary was created from. Also, like Ilya mentioned, the column types are important, ID and DF should have integer format, IDF should have floating point format.

    Id,Ngram,DF,IDF
    1,good_read,1,1.0
    2,excellent_story,1,1.0
    -1,total.num.docs,10,0

    Also, for consistency, when using TF-IDF weighing, the IDF should be Log10(total.num.docs / DF)

    Please have a try and let us know if you have further challenge.

    Regards,

    Yutong


    Thursday, August 29, 2019 5:28 PM
    Moderator

All replies

  • Hi C. Mak,

    Thanks for reaching out. Based on the error message, your id column is not the correct type, it should be int instead of string.

    Please note that if you are creating your own custom vocabulary without using the module, you will need to supply the four columns (Id, Ngram, DF, IDF) with the correct types (int, string, int, double).

    The module expects the data in following format, for example. The last line should correspond to the total number of documents the custom vocabulary was created from. Also, like Ilya mentioned, the column types are important, ID and DF should have integer format, IDF should have floating point format.

    Id,Ngram,DF,IDF
    1,good_read,1,1.0
    2,excellent_story,1,1.0
    -1,total.num.docs,10,0

    Also, for consistency, when using TF-IDF weighing, the IDF should be Log10(total.num.docs / DF)

    Please have a try and let us know if you have further challenge.

    Regards,

    Yutong


    Thursday, August 29, 2019 5:28 PM
    Moderator
  • Hello Yutong,

    Thanks so much for your advice.  

    The error message was a result of my mistake.  I generate the Results dataset from the Extract N-Gram Features from Text and used it in the predictive experiment.  I should have generated the Result vocabulary instead.

    Sorry for the confusion.

    Regards,

    C. Mak


    Thursday, September 5, 2019 6:26 PM