none
Visual Gesture Builder - training for the negative RRS feed

  • Question

  • Hi again.

    In my project I am trying to track an air squat.

    In this particular case, a partially completed gesture doesn't count (like they didn't stand all the way up, or they didn't get down low enough, etc).

    Granted, in setting up the positives, I can establish the criteria for completed discrete gestures.  But the documentation suggests that training for failed gestures is important, too. 

    Do I just train lots of gestures in some varying degrees of partially completed state and mark them all as negative?  If failing to stand up all the way is a fail, and the subject is going to pass through that position on the way to completing the gesture, won't that confuse the ML?

    Excited about this project, but accuracy is important and before I mark 100's of videos, I want to make sure I know what I should be doing.

    TIA

    Bill

    Monday, June 22, 2015 10:07 PM

Answers

  • Hello Bill,

    You definitely do NOT want to tag a critical section of your gesture as negative, as this will confuse the trainer. The discrete gesture gives a simple true or false result, either the gesture is happening or it is not. Discrete gestures cannot tell you if a gesture has been fully completed. For that, you need to use a continuous gesture (see the other post).

    For negative training samples, think about gestures that are similar to the air squat, but should not be counted as a squat. By default, all frames that are not marked as positive, will be considered negative gesture frames. So, this will include frames where the user is holding the squat, moving down into the squat, and standing. You could also include the user sitting down on a chair or exercise ball to help eliminate cheating by the user, but you'll have to test to see how this negatively impacts your detector when they stand back up. Depending on the type of squat, if the user has their feet together, it might be considered negative. If you don't have 'Ignore Left Arm' and 'Ignore Right Arm' set to 'true', then you will also need to think of scenarios where the arms are invalid (stretched out to the side, above the user's head, etc). I would recommend ignoring the arms to help reduce your training and testing matrix.

    Typically, you want to train your gesture in several phases. You'll create a gesture prototype (~10 clips, 30sec to 1min in length, each clip should have ~5-10 positive examples of the gesture), and then load your prototype database into the VGB Viewer tool and test it. Try different motions to find what works and what doesn't. If you find a motion that you don't want to be considered a squat, but your gesture database is detecting it as a squat, then you should make some recordings of that motion and include them as negative training samples. I like to work with small batches of clips (adding ~10 new clips at a time) to see how they affect the database. Testing your database with each build is key to determining what the next set of clips you need to create to improve your detector.

    ~Angela

    Thursday, June 25, 2015 5:49 AM

All replies

  • Hello Bill,

    You definitely do NOT want to tag a critical section of your gesture as negative, as this will confuse the trainer. The discrete gesture gives a simple true or false result, either the gesture is happening or it is not. Discrete gestures cannot tell you if a gesture has been fully completed. For that, you need to use a continuous gesture (see the other post).

    For negative training samples, think about gestures that are similar to the air squat, but should not be counted as a squat. By default, all frames that are not marked as positive, will be considered negative gesture frames. So, this will include frames where the user is holding the squat, moving down into the squat, and standing. You could also include the user sitting down on a chair or exercise ball to help eliminate cheating by the user, but you'll have to test to see how this negatively impacts your detector when they stand back up. Depending on the type of squat, if the user has their feet together, it might be considered negative. If you don't have 'Ignore Left Arm' and 'Ignore Right Arm' set to 'true', then you will also need to think of scenarios where the arms are invalid (stretched out to the side, above the user's head, etc). I would recommend ignoring the arms to help reduce your training and testing matrix.

    Typically, you want to train your gesture in several phases. You'll create a gesture prototype (~10 clips, 30sec to 1min in length, each clip should have ~5-10 positive examples of the gesture), and then load your prototype database into the VGB Viewer tool and test it. Try different motions to find what works and what doesn't. If you find a motion that you don't want to be considered a squat, but your gesture database is detecting it as a squat, then you should make some recordings of that motion and include them as negative training samples. I like to work with small batches of clips (adding ~10 new clips at a time) to see how they affect the database. Testing your database with each build is key to determining what the next set of clips you need to create to improve your detector.

    ~Angela

    Thursday, June 25, 2015 5:49 AM
  • Thanks, Angela.

    OK.  I can be a bit more specific.  I try not to make my questions too long...

    There are actually multiple exercises I'm trying to detect.  One is a simple air squat, defined like this:

    1. Hips below knees at the bottom
    2. Full extension of this hips at the top.
    3. Arm position is irrelevant

    So I am creating one gesture that shows item #1 and another gesture that shows item #2.  I really don't care about continuous gestures because all I want to know is that 1 is detected and 2 is detected before another 1 is detected.  Percentage completion is not important to me (at least for now).

    However, an overhead squat has a very similar definition:

    1. Hips below knees at the bottom
    2. Full extension of this hips at the top.
    3. Arm position is overhead and extended outward with elbows locked

    So now I can use the same clips as the air squat to determine that item 1 is detected, but new clips that show hip extension and arm position for the overhead squat.  Same as before, I need them in order and both of them before the full gesture is complete.

    So here's the quandary:  When I get the hip extensions for the air squat, I don't care where the arms are, unless they are in a position that matches the overhead squat.  In that case, I want to detect an overhead squat finish gesture instead of an air squat finish gesture.

    So, do I use the overhead squat finish gesture with arms extended as a negative training clip for the air squat finish gesture?

    That's how the ML would tell the difference, right?

    Also, since passing through the bottom of the air squat is all that's needed to detect the bottom gesture, my markings are very, very short.  Does that seem like it makes sense?  Otherwise I'd be detecting a change in direction, etc.

    P.S. Loved the deep dive videos, they were super helpful.  And it's awesome that you are helping in this forum.

    Thursday, June 25, 2015 3:57 PM
  • Hey Bill,

    Yes, what you describe makes sense and should work. There are a ton of different approaches that can be made through a combination of tagging, training, and coding. This would be my approach (but that doesn't mean it's the best one):

    Air Squat start - Ignore left/right arms. Include Overhead Squat clips as positive training examples.

    Air Squat end - Ignore left/right arms. Include Overhead Squat clips as positive training examples.

    Overhead Squat start - Do not ignore arms. Include Air Squats without correct arms as negative training examples.

    Overhead Squat end - Do not ignore arms. Include Air Squat without correct arms as negative training examples.

    In code, you can decide how to deal with the two gestures. In the following pseudo-code (totally untested), I only award the overhead squat if the user had their arms above their head for both start and end gestures. If one of the gestures did not have the hands overhead, then I count it as a normal Air Squat instead. Also, I only check for the AirSquat, if the OverheadSquat is not already occuring:

    // Check if any previously-detected squats have ended
    if(overheadSquatting)
    {
       if(overheadSquatEnd.detected)
       {
         overheadSquatCount++;
         overheadSquatting = false;
       }
       else if(airSquatEnd.detected)
       {
          airSquatCount++;
          overheadSquatting = false;
       }
    }
    else if(airSquatting)
    {
       if(airSquatEnd.detected || overheadSquatEnd.detected)
       {
          airSquatCount++;
          airSquatting = false;
       }
    }
    else
    {
      // Check for the beginning of a new squat
      if(OverheadSquatStart.detected)
      {
        //set a flag to indicate the user is doing an overhead squat
        overheadSquatting = true;
      }
    
      else if(AirSquatStart.detected)
      {
        // set a flag to indicate the user is doing an air squat
        airSquatting = true;
      }
    }

    Since you're not tagging the dynamic motion that occurs when the user moves from the start to the end position, then you might be able to improve your training by including static poses of each gesture in your training set (hold the squat in the beginning and end positions, and mark each held frame as positive for the corresponding gesture). This should give you a lot more frames to tag and train with.

    I'm very glad that you found the videos helpful. Good luck with your project :)

    ~Angela


    Friday, June 26, 2015 1:55 AM