none
Dataset contains invalid data. . ( Error 0018 ) RRS feed

  • Question

  • I created datasets based on the MNIST train and test datasets. It is part of a learning experiment to try different types of neural nets for classification and regression. I was able to successfully complete the training process but when I use Score Model for the test dataset I get the error below. 

    "Please remove missing values from label column and ensure all categorical levels have corresponding value in label column in the training dataset. Dataset contains invalid data. . ( Error 0018 )"

    I can not find any problems with the files based on visual inspection. There are no missing values in the label column (or anywhere else) and all the categories appear in the label column with equal frequency for both the training and test datasets. I don't know how to get additional information out that will indicate where the problem is. Below is a segment of the test file. Labels are 0..7 and features are 0..255.

    Label	f0	f1	f2	f3	f4	f5	f6	f7	
    0	0	0	0	0	0	0	0	0	0	
    0	0	0	0	0	0	0	0	0	0	
    0	0	0	0	0	0	0	0	0	0	
    0	0	0	0	0	0	0	0	0	0	
    1	0	0	0	0	0	0	0	0	0	
    1	0	0	0	0	0	0	0	0	0	
    

    Thanks for any pointers.


    AWeathers


    • Edited by adweathers Tuesday, August 13, 2019 12:02 AM
    Tuesday, August 13, 2019 12:00 AM

Answers

  • Thanks for the suggestions. They helped me determine there was nothing wrong with my data. I found the true problem was not in the dataset but in the configuration of the model. I had copied it from a different experiment and forgot to change one parameter. In the original experiment there were 10 categories but in the new experiment there were only 8. Since the dataset only contained 8 categories the scorer thought there were missing category values. I updated the model configuration and the problem went away.

    AWeathers


    • Marked as answer by adweathers Tuesday, August 13, 2019 4:42 PM
    • Edited by adweathers Tuesday, August 13, 2019 4:43 PM
    Tuesday, August 13, 2019 4:41 PM

All replies

  • Hello,

    Based on the error message it looks like there could be missing value for the label column or an invalid character that could be causing an issue. Could you please try to use the module clean missing data, remove duplicate rows or edit metadata to perform cleanup on the dataset?

    Here are some other pointers for resolving this error 0018

    • The module requires a label column, but no column is marked as a label, or you have not selected a label column yet.
    • The module requires that data be categorical but your data is numeric.
    • The module requires a specific data type. For example, ratings provided to Train Matchbox Recommender can be either numeric or categorical, but cannot be floating point numbers.
    • The data is in the wrong format.
    • Imported data contains invalid characters, bad values, or out of range values.
    • The column is empty or contains too many missing values.

    Hope this helps to resolve the error.

    -Rohit

    Tuesday, August 13, 2019 5:21 AM
    Moderator
  • Thanks for the suggestions. They helped me determine there was nothing wrong with my data. I found the true problem was not in the dataset but in the configuration of the model. I had copied it from a different experiment and forgot to change one parameter. In the original experiment there were 10 categories but in the new experiment there were only 8. Since the dataset only contained 8 categories the scorer thought there were missing category values. I updated the model configuration and the problem went away.

    AWeathers


    • Marked as answer by adweathers Tuesday, August 13, 2019 4:42 PM
    • Edited by adweathers Tuesday, August 13, 2019 4:43 PM
    Tuesday, August 13, 2019 4:41 PM