Confusion Matrix has rows/columns with values that are not valid labels RRS feed

  • Question

  • I created a classification model with a Multiclass Decision Forest. My data is labeled with two distinct labels: 'Credited' and 'Not Credited'. When I view the outcome of the classifier, I see only these two values if I visualize the scored dataset. However, when I add the Evaluate Model module, there are numerical category labels that I see nowhere else as labels in my data. When I look at the scored data manually, the classification seems to have run as expected. Where are these values coming from and how can I get rid of them?

    Wednesday, August 14, 2019 9:23 PM

All replies

  • Hello Laura,

    When you use the Evaluate model module the accuracy of a trained model is measured, You provide a dataset containing scores generated from a model, and the Evaluate Model module computes a set of industry-standard evaluation metrics which are returned based on the type of model being evaluated. In this case it is a classification model. For classification models the metrics returned could be any of the following, they are ranked by the metric you select for evaluation.

    • Accuracy measures the goodness of a classification model as the proportion of true results to total cases.
    • Precision is the proportion of true results over all positive results.
    • Recall is the fraction of all correct results returned by the model.
    • F-score is computed as the weighted average of precision and recall between 0 and 1, where the ideal F-score value is 1.
    • AUC measures the area under the curve plotted with true positives on the y axis and false positives on the x axis. This metric is useful because it provides a single number that lets you compare models of different types.
    • Average log loss is a single score used to express the penalty for wrong results. It is calculated as the difference between two probability distributions – the true one, and the one in the model.
    • Training log loss is a single score that represents the advantage of the classifier over a random prediction. The log loss measures the uncertainty of your model by comparing the probabilities it outputs to the known values (ground truth) in the labels. You want to minimize log loss for the model as a whole.

    Could you please check if the above values are seen in your results of evaluate model as these are the metrics reported by this module after the run.


    Friday, August 16, 2019 10:47 AM