none
AzureML time series model cannot recognize feature values on inference RRS feed

  • Question

  • Same post on Stackoverflow: https://stackoverflow.com/questions/59204799/azureml-time-series-model-cannot-recognize-feature-values-on-inference

    I have downloaded a trained model from Azure Machine Learning. It was trained with Automated ML, using the Time Series forecasting preset.

    When I want to run predictions, I get this message:

    ```
    NumericalizeTransformer: Column AircraftModel contains categories not present at fit: {('42',)}. These categories will be set to NA prior to encoding.
      .format(col, new_cats))
    Column Operator contains categories not present at fit: {('US Airlines',)}. These categories will be set to NA prior to encoding.
      .format(col, new_cats))
    ```

    My code for running forecast is this:

        def load_model():
            global model
            model_path = 'model.pkl'
            model = joblib.load(model_path)

        def run_forecast(data):
            try:
                y_query = data.pop('y_query').values
                #y_query.fill(np.nan)
                result = model.forecast(data, y_query)
            except Exception as e:
                result = str(e)
                return json.dumps({"error": result})

            forecast_as_list = result[0].tolist()

            return forecast_as_list

        input_sample = pd.DataFrame(data=[{'AircraftId': 'ATR-0001', 'FromDate': '2016-09-01T00:00:00.000Z', 'AircraftModel': '42', 'Operator': 'US Airlines', 'Country': 'Denmark', 'MonthOfYear': 9, 'y_query': 1.0}])

        load_model()

        forecast = run_forecast(input)

    I get a result returned, however it is quite bad and I suspect the omitted feature columns is the culprit.

    Should I manually do some pre-processing before running inference on the model?

                
    Thursday, December 5, 2019 11:54 PM

All replies

  • Hi,

    Did you confirm whether the categories in your testing dataset are contained in your training dataset?

    Regards,

    GiftA-MSFT.

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.

    Friday, December 6, 2019 11:03 PM
    Moderator
  • Hi,

    The categories I sent to the model at inference time, was for sure part of the training set. For AircraftModel there's a small set of values, and I tried them all but get the same warning back.

    Monday, December 9, 2019 10:33 AM
  • Hi,

    Thanks for your feedback. Have you considered encoding your categorical variables as part of your data pre-processing steps? Please refer to the following document on Data pre-processing & featurization. Let me know if you have any more questions or concerns. Thanks.

    Regards,

    GiftA-MSFT.

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.

    Monday, December 9, 2019 11:45 PM
    Moderator
  • Well, that article clearly states that my input parameters will be pre-processed according to the steps applied at training time?

    Automated machine learning pre-processing steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. When using the model for predictions, the same pre-processing steps applied during training are applied to your input data automatically.

    I could do my own pre-processing of the training data, but that partly sells out on the idea of automated ML.

    Thursday, December 12, 2019 9:30 PM
  • Hi,

    Thanks for your feedback. I wasn't suggesting that you perform your own separate pre-processing. Advanced pre-processing is an option you can enable during training. Thanks.

    Regards,

    GiftA-MSFT.

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.

    Saturday, December 14, 2019 5:36 PM
    Moderator
  • I'm not sure I get your point, sorry.

    I presume (as per docs) AutoML will perform automatic feature normalization, which becomes part of the underlying model.

    In which case *should* you chose to do "Advanced pre-processing", instead of letting AutoML do it?

    Monday, December 16, 2019 8:28 PM
  • Hi,

    AutoML by default performs standard pre-processing. However, you can enable advanced pre-processing which will handle things like encoding categorical variables. Furthermore, in your code, it seems like you are forecasting using "input" instead of "input_sample", is that a typo?  If possible, please share more details about your model by publishing it in Azure AI Gallery so we can troubleshoot further. One thing to point out is that if there is very low representation of a feature class or value in the training data, it can affect prediction results. Hope this helps. Thanks.

    Regards,

    GiftA-MSFT.

    If a post helps to resolve your issue, please click “Mark as Answer” and/or “Vote as helpful”. By marking a post as Answered and/or Helpful, you help others find the answer faster.  Thanks.

    Monday, December 16, 2019 9:10 PM
    Moderator