# False Positive, Average RMS in Analysis Results of VGB

• ### Question

• Hello,

I built a solution in VGB, and I need to report the accuracy/precision of the classifiers. Unfortunately, I did not find any useful references to explain the result features accurately. I reviewed all available clips and tutorials, but no luck to find some sense. For instance, what is the metric for false positive? percentage, or a number in frames, or a portion of clips. Also I need to figure out how these false positive values, and  are calculated in the results section.

For my own gesture project, # TP=158, # FP=258 # TN= (#Total_Frames)- P = 18262- (158+258), #FN=0, all these values are reported in Number of frames. If I report the Precision, which is TP/P, is going to be really poor result. Do I need to report Recall as well?

Thanks,

Lili

Thursday, January 15, 2015 6:56 AM

• Hello Lili,

Multiply by 100 to get False Positive/Negative results as a percentage. These values are not actually based on the total number of frames, but on the total number of gesture tags in the analysis set. For example, if you have 10 positive tags in your analysis set , and 8 of them were detected, then False Negatives will be 2/10 = 0.2 (20%). If the detector happens to also detect 1 additional gesture that doesn’t actually exist, then your False Positives will be 1/10 = 0.1 (10%).

The VGB Whitepaper (http://aka.ms/k4wv2vgb) has a section on analyzing gestures, which might prove helpful (in particular, see the ‘How much data is enough?’ section).

When you drill into the clips, this is how the errors will appear in the detection graph:

The most important errors to minimize are False Positives (FP) and False Negatives (FN).

• Marked as answer by Monday, January 19, 2015 6:50 AM
Saturday, January 17, 2015 12:14 AM
• For more detail about analysis results, we’ll use an example. Let’s say that we have an analysis project with only one clip. This clip has a frame range of 3332 to 3928, and five positive gesture tags. After we analyze this project, the following results are reported:

The first thing that might seem off is the total number of frames used. If you only tagged positive frames, then all valid frames will be included in the total. However, if you tagged frames as positive or negative, then only the tagged frames will be used, and any untagged frames will be ignored. In either case, frames that lack a skeleton, will not be included in the total. When we drill into this clip, we find that only positive tags are used, but the skeleton does not appear until frame 3358. This means that our true frame range will be 3358 to 3929, inclusive, for a total of 571 frames.

The next value reported is the ‘Worst Error’, this value will usually be 1 for AdaBoostTrigger projects. This indicates that somewhere in our clip, the detection result and the gesture tag differ by a value of 1 (i.e. detection is False when it should be True). If we select the clip, we get a new set of results, which will give us more detail about the error(s):

The clip analysis results are graphed in the same space as our tags. In an ideal world, the spikes on this graph would match perfectly with the start and end of each gesture tag. Unfortunately, this is not an ideal world, so you will almost always have some offset between the start/end of the gesture tag and the start/end of each spike in the detection graph, this is what the RMS value is tracking. If you find that your gesture detector is sluggish, then you will want to try to decrease the RMS value. To minimize this value, ensure that the start of each gesture is accurately tagged (tagging consistency is very important when training gestures). There are also some build settings that can be tweaked to get a faster response, but usually adding more training clips and/or improving the tags that you already have is the best approach.

In this particular analysis, there were no False Positives found. If this clip did have a False Positive, it would appear as a spike in the detection graph with no corresponding gesture tag to overlap with. As mentioned earlier, False Positives/Negatives are not calculated on a per-frame basis, but on a per-gesture basis. Because every spike that the gesture detector reported overlaps with a positive gesture tag, there are no False Positives in this analysis set.

False Negatives occur whenever a positive gesture tag is present, but there is not a corresponding spike in the detection graph. In the original graph above, five positive gesture tags are available in the clip, but only three of them were detected. That means that two gestures were not recognized at all. The value reported for False Negatives is 2/5 = 0.4. This is the type of error that you want to work hard to minimize (same with False Positives). Review the tags for accuracy and/or add more training clips to help your gesture detector become better at recognizing the behavior in order to get this value down.

When analyzing your gestures, it is important that you not reuse any clips that were used during training, as this will make your gesture detector appear to be more accurate than it actually is. We suggest separating your clips into training and analysis projects, with ~66% of your clips reserved for training, and the remaining ~33% for analysis.

I hope this information will prove useful,

~Angela

• Marked as answer by Monday, January 19, 2015 6:50 AM
Saturday, January 17, 2015 12:15 AM

### All replies

• Hello Lili,

Multiply by 100 to get False Positive/Negative results as a percentage. These values are not actually based on the total number of frames, but on the total number of gesture tags in the analysis set. For example, if you have 10 positive tags in your analysis set , and 8 of them were detected, then False Negatives will be 2/10 = 0.2 (20%). If the detector happens to also detect 1 additional gesture that doesn’t actually exist, then your False Positives will be 1/10 = 0.1 (10%).

The VGB Whitepaper (http://aka.ms/k4wv2vgb) has a section on analyzing gestures, which might prove helpful (in particular, see the ‘How much data is enough?’ section).

When you drill into the clips, this is how the errors will appear in the detection graph:

The most important errors to minimize are False Positives (FP) and False Negatives (FN).

• Marked as answer by Monday, January 19, 2015 6:50 AM
Saturday, January 17, 2015 12:14 AM
• For more detail about analysis results, we’ll use an example. Let’s say that we have an analysis project with only one clip. This clip has a frame range of 3332 to 3928, and five positive gesture tags. After we analyze this project, the following results are reported:

The first thing that might seem off is the total number of frames used. If you only tagged positive frames, then all valid frames will be included in the total. However, if you tagged frames as positive or negative, then only the tagged frames will be used, and any untagged frames will be ignored. In either case, frames that lack a skeleton, will not be included in the total. When we drill into this clip, we find that only positive tags are used, but the skeleton does not appear until frame 3358. This means that our true frame range will be 3358 to 3929, inclusive, for a total of 571 frames.

The next value reported is the ‘Worst Error’, this value will usually be 1 for AdaBoostTrigger projects. This indicates that somewhere in our clip, the detection result and the gesture tag differ by a value of 1 (i.e. detection is False when it should be True). If we select the clip, we get a new set of results, which will give us more detail about the error(s):

The clip analysis results are graphed in the same space as our tags. In an ideal world, the spikes on this graph would match perfectly with the start and end of each gesture tag. Unfortunately, this is not an ideal world, so you will almost always have some offset between the start/end of the gesture tag and the start/end of each spike in the detection graph, this is what the RMS value is tracking. If you find that your gesture detector is sluggish, then you will want to try to decrease the RMS value. To minimize this value, ensure that the start of each gesture is accurately tagged (tagging consistency is very important when training gestures). There are also some build settings that can be tweaked to get a faster response, but usually adding more training clips and/or improving the tags that you already have is the best approach.

In this particular analysis, there were no False Positives found. If this clip did have a False Positive, it would appear as a spike in the detection graph with no corresponding gesture tag to overlap with. As mentioned earlier, False Positives/Negatives are not calculated on a per-frame basis, but on a per-gesture basis. Because every spike that the gesture detector reported overlaps with a positive gesture tag, there are no False Positives in this analysis set.

False Negatives occur whenever a positive gesture tag is present, but there is not a corresponding spike in the detection graph. In the original graph above, five positive gesture tags are available in the clip, but only three of them were detected. That means that two gestures were not recognized at all. The value reported for False Negatives is 2/5 = 0.4. This is the type of error that you want to work hard to minimize (same with False Positives). Review the tags for accuracy and/or add more training clips to help your gesture detector become better at recognizing the behavior in order to get this value down.

When analyzing your gestures, it is important that you not reuse any clips that were used during training, as this will make your gesture detector appear to be more accurate than it actually is. We suggest separating your clips into training and analysis projects, with ~66% of your clips reserved for training, and the remaining ~33% for analysis.

I hope this information will prove useful,

~Angela

• Marked as answer by Monday, January 19, 2015 6:50 AM
Saturday, January 17, 2015 12:15 AM
• Dear Angela,

Thanks very much for your great response. It helped me a lot. I reviewed the whitepaper once again, as you suggested. Can I also ask about the Figure 3 in the Whitepaper, about Percentage of False Positive, and (Percentage of ) False Negatives? In this graph,  the results are reported in number of frames, that's why I am asking.

P.S. I guess the legends in the graph is incorrect. I mean % Error True Positive needs to be % Error False Negative. Is that correct?

Thanks again,

Lili

• Edited by Saturday, January 17, 2015 1:36 AM
Saturday, January 17, 2015 1:34 AM
• Hey Lili,

You are correct, the legend should say 'False Negatives' instead of 'True Positives'.

In the graph above, the x-axis tracks the number of frames used to train the gesture. This will typically be all of the positive/negative frames used during the build step of your project. When building a gesture project, the output will tell you how many frames were used. These are not the same frames that are used during analysis, so the Total Frames reported in that output will not reproduce the graph above. In order to make a similar graph, you would need to keep track of the build output value every time that you create a new database, and record that as your x value.

The y-axis should be tracking the % of False Positives and False Negatives reported during analysis. If the False Positives is 0.1 and the False Negatives is 0.4, when analyzing a specific database, then you would record the y values as 10 and 40, for FP and FN, respectively.

This graph is meant to show the overall trend in gesture errors. As you add more and more training data to improve your gestures, you should eventually see your FP and FN values decrease. At some point, your gesture will hit a plateau, where no matter how much data you add to the training set, the FP and FN values will not change. This is the point in time where you will likely be unable to do anything more (via training) to improve your gestures. We call this "chasing the long tail", and it is not expected that you will be able to remove all errors from the system.

By watching how FP and FN values change over multiple iterations of the database (assuming that each database is created with a larger set of training data than the one before), you should get a feel for how your gesture is performing and if you have acquired enough training data.

~Angela

Monday, January 19, 2015 9:24 PM