none
Visual Gesture Builder and knowing when a gesture is finished. RRS feed

  • Question

  • Hi,

    In my project, I'm trying to track an air squat.  Basically I am tracking when the hips are below the knees and when the subject stands completely up and knees and hips are fully extended.

    I need to be able to count how many someone has done.

    I was thinking of starting the gesture when the hips are below the knees and then finishing it when they are standing up.  Technically, there isn't a time limit and I don't really think it matters how they got down to the squat position, just that they got back up.

    So I thought the gesture would be marking the bottom as a discrete gesture and the top as a discrete gesture and then creating a continuous gesture from those two.

    But what if they start back down before they get fully extended?  In other words, they never really reach the full extension?  I know at that point the continuous gesture would never reach a value of 1, so what happens when they start another continuous gesture (hit bottom) without finishing the first one?

    Since I'm not doing animation, would it be just as easy to look for both gestures, in sequence, and if that doesn't happen we just discard that attempt (alerting the user) and start over?  There are more complex gesture to be done, but they all have some of the same characteristics, multiple discrete gestures that *must* be done in sequence.

    Not sure what a continuous gesture would do for me here...

    Am I on the right track?

    TIA

    Monday, June 22, 2015 10:01 PM

Answers

  • Hey Bill,

    You will probably want to record the gesture being performed at different speeds. The longer squats will allow you to get more positive frames in the training set. Training at a lower speed should not keep the detector from recognizing the gesture at higher speeds, as long as you also include higher speed clips in the training set. By mixing speeds, you'll teach the ML that velocity is not an important classifier.

    Typically, the smaller the gesture (in terms of frames), the more difficult it is to train, because the body changes are so slight during this period that the ML doesn't have a lot to go on. I've had the most success with gestures that are >10 frames in length, but have also trained gestures that are only 3 frames long by adding more training clips.

    When tagging, you should include a few frames that lead-up to your critical spot (the bottom of the squat), along with a few trailing frames. This will help the gesture to be recognized faster, and reduce the chance that it will be missed entirely due to delays in the sensor/gesture recognition software.

    Since you don't need to know %completion, then your original idea of using two discrete gestures (one to mark the bottom of the squat, and one to mark the top) will probably work fine. If the second gesture does not occur after the first, then you can assume that the gesture was performed incorrectly.

    ~Angela

    Friday, June 26, 2015 12:12 AM

All replies

  • Hello whnoel,

    Because you only want to count the gesture when it is completed, you will need to use a continuous gesture to track its progress. The discrete gesture(s) are required to know when the squat is occurring, so that you can accurately interpret the signal from the continuous gesture. For this gesture, you could use a single discrete gesture to track the squat. For tagging the discrete gesture, you will want to include a few frames where the user is holding the squat, but the majority of the frames should cover the motion of the user moving upwards, and then a few frames where they are in the final position (gesture complete). The continuous gesture can follow the same motion, and if you have already tagged the discrete gesture in all of your clips, you could use the 'Generate Tags' option, with 0 as the start value, and 1 as the end value. In code, you will first check to see if the discrete gesture is happening, if true, you will check the result of the continuous gesture. If the discrete gesture changes to false, before the continuous gesture reaches 1 (or, within some tolerance like 0.8), then that squat should not be counted in your program. If the discrete gesture is not happening, then there is no need to check the continuous gesture result.

    I hope this explanation helps,

    ~Angela

    Thursday, June 25, 2015 5:03 AM
  • Angela,

    One aspect of this is that people are snapping these out as quickly as possible - sometimes in competitions.  I can stage whatever delays I want during recording but in actual use there won't be any pausing at the bottom or the top.

    I suppose I can do some experimenting, but for our purposes, it only matters that certain hip, knee or arm extensions are achieved.  Percentage completion doesn't really have any value.  Does the ML need a minimum number of frames to see a gesture and learn from it?

    I totally understand the relationship between the discrete and continuous gestures, such as setting context, etc.  I'm just not sure what it gets me for our uses.  Perhaps more data for the ML.

    Thanks for responding.

    Bill

    Thursday, June 25, 2015 4:12 PM
  • Angela,

    I just finished marking about 10 clips, each with 10 gestures in it.  I think I see the problem.

    I was very careful to mark the exact point at which the subject's body position was good enough to show the bottom of the squat.  But very little got picked up when I did the Live preview.

    I presume that my tagged gesture was just too short.

    I'm going to try again in analysis.

    Thursday, June 25, 2015 5:45 PM
  • Hey Bill,

    You will probably want to record the gesture being performed at different speeds. The longer squats will allow you to get more positive frames in the training set. Training at a lower speed should not keep the detector from recognizing the gesture at higher speeds, as long as you also include higher speed clips in the training set. By mixing speeds, you'll teach the ML that velocity is not an important classifier.

    Typically, the smaller the gesture (in terms of frames), the more difficult it is to train, because the body changes are so slight during this period that the ML doesn't have a lot to go on. I've had the most success with gestures that are >10 frames in length, but have also trained gestures that are only 3 frames long by adding more training clips.

    When tagging, you should include a few frames that lead-up to your critical spot (the bottom of the squat), along with a few trailing frames. This will help the gesture to be recognized faster, and reduce the chance that it will be missed entirely due to delays in the sensor/gesture recognition software.

    Since you don't need to know %completion, then your original idea of using two discrete gestures (one to mark the bottom of the squat, and one to mark the top) will probably work fine. If the second gesture does not occur after the first, then you can assume that the gesture was performed incorrectly.

    ~Angela

    Friday, June 26, 2015 12:12 AM