none
ML.NET Pipeline with ImageResizer fails on CrossValidate (ArgumentException) RRS feed

  • Question

  • Hello,

    I've been trying to setup a pipeline for an image classification web API, but I'm getting the following error:

    ArgumentException: Parameter is not valid.

    • System.Drawing.Image.get_Height()

    • Microsoft.ML.Transforms.Image.ImageResizingTransformer+Mapper+<>c__DisplayClass3_0.<MakeGetter>b__1(ref Bitmap dst)

    • Microsoft.ML.Transforms.Image.ImagePixelExtractingTransformer+Mapper+<>c__DisplayClass5_0<TValue>.<GetGetterCore>b__1(ref VBuffer<TValue> dst)

    • Microsoft.ML.Data.DataViewUtils+Splitter+InPipe+Impl<T>.Fill()

    • Microsoft.ML.Data.DataViewUtils+Splitter+<>c__DisplayClass9_0.<SplitCore>b__1()

    The weird thing is that it only fails if I try to cross-validate, but works if I just fit the data. My pipeline looks like this:

    var pipeline = _mlContext.Transforms.Conversion.MapValueToKey(LabelToKey, "Label")
                    .Append(_mlContext.Transforms.ResizeImages(outputColumnName: TensorFlowModelSettings.inputTensorName, imageWidth: ImageSettings.imageWidth, imageHeight: ImageSettings.imageHeight, inputColumnName: nameof(ImageInputData.Image)))
                    .Append(_mlContext.Transforms.ExtractPixels(outputColumnName: TensorFlowModelSettings.inputTensorName, interleavePixelColors: ImageSettings.channelsLast, offsetImage: ImageSettings.mean))
                    .Append(_mlContext.Model.LoadTensorFlowModel(tensorFlowModelFilePath).
                    ScoreTensorFlowModel(outputColumnNames: new[] { TensorFlowModelSettings.outputTensorName },
                                        inputColumnNames: new[] { TensorFlowModelSettings.inputTensorName }, addBatchDimensionInput: false))
                    .Append(_mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(labelColumnName: LabelToKey, featureColumnName: TensorFlowModelSettings.outputTensorName))
                    .Append(_mlContext.Transforms.Conversion.MapKeyToValue(PredictedLabelValue, "PredictedLabel"))
                    .AppendCacheCheckpoint(_mlContext);
    
                var cvResults = _mlContext.MulticlassClassification.CrossValidate(trainData, pipeline, numberOfFolds, labelColumnName: LabelToKey);
    
                // Select all models
                ITransformer[] mlModels =
                    cvResults
                        .OrderBy(fold => fold.Metrics.LogLoss)
                        .Select(fold => fold.Model)
                        .ToArray();
    
                // Get Top Model
                //ITransformer mlModel = pipeline.Fit(trainData);
                ITransformer mlModel = mlModels[0];

    This is based on the ImageClassification sample on https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/end-to-end-apps/DeepLearning_ImageClassification_TensorFlow

    Does anybody have an idea?

    Thanks in advance.

    Saturday, August 17, 2019 2:45 AM

All replies

  • Update: the problem seems to be a result of the cross validation, rather than any operation in the pipeline. The modified pipeline below works:

    var pipeline = _mlContext.Transforms.Conversion.MapValueToKey("Label")           
                    .Append(_mlContext.Transforms.ResizeImages(outputColumnName: TensorFlowModelSettings.inputTensorName, imageWidth: ImageSettings.imageWidth, imageHeight: ImageSettings.imageHeight, inputColumnName: nameof(ImageInputData.Image)))
                    .Append(_mlContext.Transforms.ExtractPixels(outputColumnName: TensorFlowModelSettings.inputTensorName, interleavePixelColors: ImageSettings.channelsLast, offsetImage: ImageSettings.mean/*, inputColumnName: nameof(ImageInputData.Image)*/))
                    .Append(_mlContext.Model.LoadTensorFlowModel(tensorFlowModelFilePath).
                    ScoreTensorFlowModel(outputColumnNames: new[] { TensorFlowModelSettings.outputTensorName },
                                        inputColumnNames: new[] { TensorFlowModelSettings.inputTensorName }, addBatchDimensionInput: false))
                    .Append(_mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(labelColumnName: "Label", featureColumnName: TensorFlowModelSettings.outputTensorName))
                    .Append(_mlContext.Transforms.Conversion.MapKeyToValue(PredictedLabelValue, "PredictedLabel"))
                    .AppendCacheCheckpoint(_mlContext);
                
                var split = _mlContext.Data.TrainTestSplit(trainData, testFraction: 0.2);
                
                ITransformer mlModel = pipeline.Fit(split.TrainSet);
                var k = mlModel.Transform(split.TestSet);
                var metrics = _mlContext.MulticlassClassification.Evaluate(k, labelColumnName: "Label");
    
               return mlModel;

    That is, using TrainTestSplit, which gives me something like one instance of k-fold cross-validation, works, but calling CrossValidate(trainData, pipeline, numberOfFolds) does not. Could it be corrupting the data somehow?

    Right now, I think I may have to implement my own cross-validation. Still hoping for someone who might shed a light here, though.

    Saturday, August 17, 2019 8:55 PM
  • Hi Sandro,

    Thank you for reaching out. This forum is for Azure Machine Learning Service relative, please post your question to  https://github.com/dotnet/machinelearning/issues . There will be someone from ML.NET team to help you. Thank you for the understanding. 

    Regards,

    Yutong

    Monday, August 19, 2019 8:38 PM
    Moderator