none
How to use String as a Predicted Column RRS feed

  • Question

  • Goal:
    Make a prediction on PaymentType. The target name or predicted column is a string value.

    Problem:
    I retrieve a error message that is "ArgumentOutOfRangeException: Training label column 'Label' type isn't suitable for regression: Text. Type must be R4 or R8. Parameter name: data"

    Whar part from the source code am I missing?

    Thank you!

    Data:
    https://github.com/dotnet/machinelearning/blob/master/test/data/taxi-fare-train.csv

    https://github.com/dotnet/machinelearning/blob/master/test/data/taxi-fare-test.csv

    Info:
    I'm new in ML.net

    Code:

    using Microsoft.ML;
    using Microsoft.ML.Data;
    using Microsoft.ML.Models;
    using Microsoft.ML.Trainers;
    using Microsoft.ML.Transforms;
    using System;
    using System.Threading.Tasks;
    
    namespace TaxiFarePrediction
    {
        class Program
        {
            const string _datapath = @".\Data\taxi-fare-train.csv";
            const string _testdatapath = @".\Data\taxi-fare-test.csv";
            const string _modelpath = @".\Data\Model.zip";
    
            static async Task Main(string[] args)
            {
                PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = await Train();
                Evaluate(model);
    
                var prediction = model.Predict(TestTrips.Trip1);
                Console.WriteLine("Predicted fare: {0}", prediction.PaymentType);
    
                Console.ReadLine();
            }
    
            static async Task<PredictionModel<TaxiTrip, TaxiTripFarePrediction>> Train()
            {
                // Create learning pipeline
                var pipeline = new LearningPipeline
                {
                    // Load and transform data
                    new TextLoader(_datapath).CreateFrom<TaxiTrip>(separator: ','),
    
                     // Labeling
                    new ColumnCopier(("PaymentType", "Label")),
    
                    // Feature engineering
                    new CategoricalOneHotVectorizer("VendorId",
                        "RateCode",
                        "PaymentType"),
    
                     // Combine features in a single vector
                    new ColumnConcatenator("Features",
                        "VendorId",
                        "RateCode",
                        "PassengerCount",
                        "TripDistance",
                        "FareAmount"),
    
                    // Add learning algorithm
                    new FastTreeRegressor()
                };
    
                // Train the model
                PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = pipeline.Train<TaxiTrip, TaxiTripFarePrediction>();
    
                // Save the model to a zip file
                await model.WriteAsync(_modelpath);
    
                return model;
            }
    
            private static void Evaluate(PredictionModel<TaxiTrip, TaxiTripFarePrediction> model)
            {
                // Load test data
                var testData = new TextLoader(_datapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ',');
    
                // Evaluate test data
                var evaluator = new RegressionEvaluator();
                RegressionMetrics metrics = evaluator.Evaluate(model, testData);
    
                // Display regression evaluation metrics
                Console.WriteLine("Rms=" + metrics.Rms);
                Console.WriteLine("RSquared = " + metrics.RSquared);
            }
        }
    }

    namespace TaxiFarePrediction
    {
        static class TestTrips
        {
            internal static readonly TaxiTrip Trip1 = new TaxiTrip
            {
                VendorId = "VTS",
                RateCode = "1",
                PassengerCount = 1,
                TripDistance = 10.33f,
                PaymentType = "", 
                FareAmount = 7,
                //FareAmount = 0 // predict it. actual = 29.5
            };
        }
    }

    using Microsoft.ML.Runtime.Api;
    
    namespace TaxiFarePrediction
    {
        public class TaxiTrip
        {
            [Column("0")]
            public string VendorId;
    
            [Column("1")]
            public string RateCode;
    
            [Column("2")]
            public float PassengerCount;
    
            [Column("3")]
            public float TripTime;
    
            [Column("4")]
            public float TripDistance;
    
            [Column("5")]
            public string PaymentType;
    
            [Column("6")]
            public float FareAmount;
        }
    
    }

    using Microsoft.ML.Runtime.Api;
    
    namespace TaxiFarePrediction
    {
        public class TaxiTripFarePrediction
        {
    
            /*
            [ColumnName("Score")]
            public float FareAmount;
    
            */
            [ColumnName("Score")]
            public string PaymentType;
        }
    }

    Thursday, October 24, 2019 12:51 PM

All replies

  • Hi Sakura,

    Are you following the quickstart from ML.net documentation? I ran the solution as mentioned in the documentation and it works as expected. 

    It looks like the issue is from ColumnCopier, Could you try to use CopyColumns as mentioned in the documentation? The error seen in this case is explained in this issue. You can try to vectorize the string values and try the same if columncopier is used.

    -Rohit

    Friday, October 25, 2019 8:53 AM
    Moderator