'must contain all features ' error when retraining text classification experiment using storage blob


  • Hi,

    I followed the Text Classification sample to create a TF-IDF experiment using the embeded R script. And store the created dictionary into a storage Blob in training experiment. And read it from the Blob in predictive web service. Everything works fine when manually run it.

    Then I created a new endpoint for retraining using Python program from local. When retraining the training experiment and overwrite the trained model, it works also perfectly. But when run the predictive experiment, it said 'The
    data set being scored must contain all features used during training'.
    The error was reported by Score Model. I checked that the storage Blob had really been updated.

    I guess if the storage Blob was not refreshed in the new endpoint. So the features between dictionary and trained model cannot be matched. But how to fix it, or is there a better solution for TF-IDF creation?

    Appreciate for any help.

    Tuesday, March 21, 2017 3:29 AM

All replies

  • Finally fixed it.

    Must add a web service output component for the dictionary in the training experiment. And add another web service input component for the dictionary in the predictive experiment.

    When calling the training web service from local, save the dictionary output to a storage blob. And get the dictionary from the storage blob when calling the predictive web service.

    Seems different endpoints dont share data even in a same storage blob. I need to further study the endpoint mechanism.

    Wednesday, March 29, 2017 1:48 AM