none
body.HandLeftState handstate false positives RRS feed

  • Question

  • How to avoid a false positive on a body.HandLeftState HandState?

    TO REPEAT THE TEST:

    Run the body basics, your hands will flash blue, red, green for moments just from normal human action. 

    So there's a Body property TrackingConfidence HandRightConfidence that provides body's right hand tracking state.

    The confidence level of a body's tracked attribute. 

    What makes something a "High" or "Low"?   

    <RANT> I love code driven docs that repeat the obvious, no wonder the agile guys say you don't need documentation. Definitely need forums to explain what was expected or any meaningful elaboration. haha.  </RANT>

    In expectation that Tracking Confidence is worthless. 

    Here's the deal, looking a quick cheap easy way to catch a gesture and do something.  This "seemed" like a short cut over the process of record me and all my cubicle cell mates jumping around gestticulating like patients at an insane asylum, then putting it through the gesture recognition grinder to hope it flags a the correct gesture.   Can you imagine if the movie War Games had a realistic implementation: "So launch all missiles, not play checkers?" Ill take those arm movements as a yes.

    I added this to the BobyBasics sample to show the HandSize 2x as big if confidence level had any value...  It still shows a momentary "flashes" of a big circle vs small ones.  It appears that the small ("low" confidence) is the hand gesture is not held out or caught at a funny location maybe?  

                                this.HandSize = 30f + (int)body.HandLeftConfidence * 30f;
                                this.DrawHand(body.HandLeftState, jointPoints[JointType.HandLeft], dc);
                                this.HandSize = 30f + (int)body.HandRightConfidence * 30f;
                                this.DrawHand(body.HandRightState, jointPoints[JointType.HandRight], dc);

    I was thinking of a Cyclic Ring Buffer approach. Scoring the last 30 results, and saying oh yeah, that guy meant to do that, not scoring a positive until it's been registered for some X length of time.   Which of course would be a different length of time for my octogenarian mother versus my instant gratification generational children.

    Any thoughts or experience from people trying the "build in" hand gesture engine vs whatever?


    Ken

    Saturday, September 6, 2014 3:42 PM

Answers

  • Sorry for the delay.

    Hand
    states are a per-frame estimation of the hand state data and was designed to be
    used as an input to a filter. Using a circular buffer for your filter
    is the right thing to do, but 30 is a bit high, 5-10 should be good enough.
    This will depend greatly on your latency threshold requirements
    for the application. This provides greater flexibility to developers to
    meet requirements to balance buffer size/thresholds/filters that you wish to
    use. The more HIGH confidence values == overall confidence

    VGB
    would be the way to go for a data driven approach for any gesture you are
    detecting if it's not grip/release. It already provides confidence values as
    part of its data.<o:p></o:p>

    Some other data on filtering hand state.
    - watch for N consecutive matching frame before changing state
    - use confidence to reduce latency (low = require X consecutive frames, high = switch the state right awy)
    - can use the hand tip joint - distance from hand tip to hand joint can provide another option
     


    Carmine Sirignano - MSFT

    • Marked as answer by Ken MacPherson Tuesday, September 16, 2014 1:50 AM
    Monday, September 15, 2014 9:31 PM

All replies

  • How is your sensor setup? It is best to have the sensor mounted 6 feet from the floor with no obstructions in-front of it(edge of table) so that it has a clear view of the floor. With the sensor pointed at the floor, the body machine learning will better be able to handle calculations of all joints and the state of the hands.


    Carmine Sirignano - MSFT

    Tuesday, September 9, 2014 7:06 PM
  • I'll check that I've been using playback mode, so the particular one I was using was at home about 4 feet and point down. I'll create a clean room sample lab setup as you suggest. However, that really doesn't answer the questions nor that in reality, optimum laboratory placement is not always available in field usage.

    I think this discussion and set of questions equally applies to the Discrete Gestures approach too, in that approach the api provides a

    DiscreteGestureResult & Confidence (float)

    I'm not sure what a float Confidence means?  float of what: 0 is no confidence and say .9999 is very positive. What is the range of the value?

    Anybody know what's happening in the Confidence (for either HandState or DiscreteGestureResult or both)? It would be helpful to understand what and how that value gets generated and what effects it.

    With all that in mind, I think the programmer or consumer of this data has to evaluate the situation, requirements, and what is coming at them.   For me, in our work for the simple hand gestures, the "noise" factors that occur in real life indicated to me a time based smoothing approach.  Like they do in stock market applications on bid/ask ticks for handling noise. 

    In fact, this is alot like using complex event processing engines where you have to filter the inbound data to normalize or smooth.  It would be good if the API gave some features for tuning this in the machine engine. This is where getting source code like in Open Source Project is cool cause you can dig in and see what the function is actually doing instead of <RANT> crappy docs that repeat what the method name typically already tells you </RANT>.

    They are having a similar discussion here for this guys problem:
    http://social.msdn.microsoft.com/Forums/en-US/c832d539-39ea-40e0-955c-35b831683446/about-rfrprogress-of-gesture-builder?forum=kinectv2sdk

    SOLUTION:
    So I created a RingBuffer class (standard collection like a Queue, but where the Capacity is fixed the stale items automatically fall out of the buffer when a new one is added unlike a queue where you have to manually trimtosize.)   I set the size of the buffer to 30 (about 1 second of frames),  then I started my hand gestures.

    Here's what a snapshot of the buffer looks like for a given noisy false positive session:

      [0] Lasso Microsoft.Kinect.HandState
      [1] Lasso Microsoft.Kinect.HandState
      [2] Lasso Microsoft.Kinect.HandState
      [3] Lasso Microsoft.Kinect.HandState
      [4] Lasso Microsoft.Kinect.HandState
      [5] Lasso Microsoft.Kinect.HandState
      [6] Lasso Microsoft.Kinect.HandState
      [7] Lasso Microsoft.Kinect.HandState
      [8] Closed Microsoft.Kinect.HandState
      [9] Closed Microsoft.Kinect.HandState
      [10] Closed Microsoft.Kinect.HandState
      [11] Closed Microsoft.Kinect.HandState
      [12] Lasso Microsoft.Kinect.HandState
      [13] Unknown Microsoft.Kinect.HandState
      [14] Unknown Microsoft.Kinect.HandState
      [15] Unknown Microsoft.Kinect.HandState
      [16] Lasso Microsoft.Kinect.HandState
      [17] Lasso Microsoft.Kinect.HandState
      [18] Lasso Microsoft.Kinect.HandState
      [19] Lasso Microsoft.Kinect.HandState
      [20] Lasso Microsoft.Kinect.HandState
      [21] Lasso Microsoft.Kinect.HandState
      [22] Lasso Microsoft.Kinect.HandState
      [23] Lasso Microsoft.Kinect.HandState
      [24] Lasso Microsoft.Kinect.HandState
      [25] Lasso Microsoft.Kinect.HandState
      [26] Unknown Microsoft.Kinect.HandState
      [27] Unknown Microsoft.Kinect.HandState
      [28] Lasso Microsoft.Kinect.HandState
      [29] Lasso Microsoft.Kinect.HandState


    Then in my decision code, I check for state change like this instead:

    ring.Count(n => n == HandState.Closed) > selectedThreshold

    Where selectedThreshold = 15, but it's a value I could change or "tune" as needed for more confidence, if I need six-sigma confident then >28. haha.

    Yeah, I was thinking about using a Naive Bayes classifier, but then this solution is due next week. Time market is why I used KISS of body.HandState in hopes that it would be a 30 minute coding solution (maybe a 1/2 morning.)   Not so much, as of right now, it seems pretty steady and reliable to capture the simple hand states and fire a command.

    Then I had to write my own eventhandler too, cause really who wants a single method with 4000 lines of logic embedded. It would be cool if the HandStates had a built-in HandState Changed, but then it wouldn't cause it needs to be filtered with a buffer first and programmable threshold.

    I have a feeling the body.HandState is going to be thrown away or atleast ignored and forgotten. So I'm using it to get me through the short term, but the abstract inner monolouge I'm sharing here is still valid for the Discrete Gesture stuff, you are going to run into the same thing.  Maybe the confidence float is a better measure than the crappy Body Hand Confidence Enum of (Bad or Worst). Confidence enum for hand gestures does work, it does provide a better confidence level than nothing, but it's still crap for real usage, you'll need to buffer it still but not as much to remove noise.  Likewise, what I've seen of the "seated" sample, it will flag a false noise once in a while, so I'm betting that a buffer system for VisualGestureBuilder results will be a requisite for go-live.

    This was still quicker than Gesture Builder stuff.  That's an long drawn out arduous process, of recording, testing, marking, training, and testing the database for matches. Still quicker than writing if/then/elseif code if you are doing yourself. 

    SUGGESTION/QUESTION does this already exist somewhere?:
    In fact, MSFT Kinect or the community should likely have a library of the ".gbd" files already done and publish them, maybe use NuGet to import them as needed. The seated sample is about 130K and the Hello test I did was about the same amount.

    It would be very productive to be able to just NuGet install-package Kinect.GestureDetection.Database.BusinessHandGestures.


    Ken

    Wednesday, September 10, 2014 1:10 PM
  • Sorry for the delay.

    Hand
    states are a per-frame estimation of the hand state data and was designed to be
    used as an input to a filter. Using a circular buffer for your filter
    is the right thing to do, but 30 is a bit high, 5-10 should be good enough.
    This will depend greatly on your latency threshold requirements
    for the application. This provides greater flexibility to developers to
    meet requirements to balance buffer size/thresholds/filters that you wish to
    use. The more HIGH confidence values == overall confidence

    VGB
    would be the way to go for a data driven approach for any gesture you are
    detecting if it's not grip/release. It already provides confidence values as
    part of its data.<o:p></o:p>

    Some other data on filtering hand state.
    - watch for N consecutive matching frame before changing state
    - use confidence to reduce latency (low = require X consecutive frames, high = switch the state right awy)
    - can use the hand tip joint - distance from hand tip to hand joint can provide another option
     


    Carmine Sirignano - MSFT

    • Marked as answer by Ken MacPherson Tuesday, September 16, 2014 1:50 AM
    Monday, September 15, 2014 9:31 PM
  • Basically, I did this for ease of speed to complete.

            private RingBuffer<HandState> rightHandStates = new RingBuffer<HandState>(30);
            private RingBuffer<HandState> leftHandStates = new RingBuffer<HandState>(30);

    I think a specialized struct incorporating state, score, and body.joint.wrist-finger delta would be very tight, but extra work not in the project plan-sprint deadline. I think your suggestion of scoring of >10 would be sufficient to remove noise too.  Right now, the >15 gives non-noticeable time lapse latency feedback and stable responses. 

    Thanks for the response and feedback, I agree that scoring a Hand Gesture confidence =High better (multiplied score algo) than a low is a good move too, but after I tried the simple ringbuffer and quality was sufficient for our needs at this juncture.   That was on my things to try next if the simple score >15 didn't work.  

    I'll consider that if I need to tune the hand gesture stuff a bit tighter or faster/lower latency.  Also, I think the concepts will be helpful too for the VGB, even though some of that can be optimized in how well one builds the database. 


    Ken

    Tuesday, September 16, 2014 2:06 AM