How to Include Row IDs in a Predictive Output RRS feed

  • Question

  • Hi All,

    When I set up my training and predictive experiments, I include a "Select Columns From Dataset" module in order to remove any rows that should not be included when training the model. I run the training experiment and then create a predictive experiment. In the predictive experiment, I then remove the predictor variable from the previously added "Select Columns From Dataset" module and add another instance of the same module after the "Score Model" module. In here, I then restrict the columns to just the Scored Labels and Scored Probabilities.

    The issue is is that when I test either single or (particularly) batch execution I don't have a reference to the original row ID so in the case of batch have no way of tying each scoring output record back to its corresponding input record, which has a row ID to uniquely identify each record as well as other other attributes that aren't necessary for model generation but I do need once the records have been scored.

    I tried to bring in the ID column in the second "Select Columns From Dataset" module from the predictive exp. but when I then run it, it errors out because the ID was removed before being passed into the training algorithm. Can advise what to do?



    Thursday, August 15, 2019 4:30 PM

All replies

  • Hello Luke,

    To get the row ID for which the prediction was made the original column must to included in the "Select columns from dataset" in your training experiment and it should be re-trained. When the predictive experiment is created the same will apply. You can remove other columns which are not required in your training as done earlier. 

    A simple regression experiment demonstrating the same is available as an example experiment in the visual interface ML feature. If you would like to build it manually you can the steps from this documentation.

    Friday, August 16, 2019 2:49 PM