none
KinectDTW - open source Kinect gesture recognition project released

    General discussion

  • This evening I published to Codeplex the gesture recording and recognition project I’ve been working on. Massive shout out to Rhemyst for doing most of the thinking on this. Grab it here:

    http://kinectdtw.codeplex.com/

    Key features:

    • Gesture recording
    • Gesture recognition
    • Save gestures to file
    • Tweakable parameters for fine control of how the recogniser works
    • Works out of the box - just run the sample app
    • Source code available too if you want to make your own gestures

    Please try this out and if you find it useful I'd love you to contribute to the project. There are a hundred things I can think of that could be done better but this is a great starting point for anyone who is interested in vector-based recognition with the Kinect.

    Furthermore, I'd love it if you shared your recorded gestures and DTW settings so that the whole community can benefit from your experiments!

    Thanks,

    Steve

    Saturday, July 30, 2011 6:52 PM

All replies

  • how hard could be implement a 3d gesture recognition?
    Wednesday, August 03, 2011 7:27 AM
  • Very easy. You have a few lines to modify. Just extract more coordinates in the SkeletonDataExtract.
    Wednesday, August 03, 2011 8:29 AM
  • Yep. It's as simple as Rhemyst said. I'll post an example on the Codeplex site soon.
    Wednesday, August 03, 2011 6:27 PM
  • A good framework.

    Had some queries:

     a. Does it require that the speed and duration of the defined gesture and the actual gesture be the same ? If not, how to handle this, without getting into user specific gesture definition ?

     b. What is the rationale behind the 32 frames  while recording ? How to handle short and long gestures ?

     c. Do you have any writeup on the algorithm behind the recognition logic (while I can try to reverse engineer to analyse and understand, it will be better if I get it from the author himself).

     

    Some of the tweaks we are planning to do on top of this source code are:

     a. Allow multiple recordings for a gesture. This is to increase the hit rate of gestures.

     b. Add lags and other joints to the recognition.

     c. Separate the Recognizer into a smaller app, keeping the recorder in an admin mode kind of pattern.

     d. Create a trainer so that users can practice the gestures after recording them. Maybe even play clips and provide guidance.

    We will upload this code back once done.

    Thursday, August 25, 2011 8:24 AM
  • a : set MaxSlope at 0.

    b : Nothing rationnal. You may modify the code to record a sequence of a different length. The DTW recognizer can have gestures of different length. (though in this case you'll need to normalize the computed DTW distance).

    C : I wrote it after reading the wikipedia DTW alg page and this page : http://web.science.mq.edu.au/~cassidy/comp449/html/ch11s02.html

    Have you checked this page ? : http://social.msdn.microsoft.com/Forums/en-US/kinectsdknuiapi/thread/4a428391-82df-445a-a867-557f284bd4b1

     

    I'll see if I can find some time to REALLY explain how this alg works.


    Thursday, August 25, 2011 3:21 PM
  • hi Rymixxxx, wow!, great great job and This looks really helpfull, so, i´m amateur development kinect, i downloaded the code and works perfectly, but i want to understand how the recognition in this program?
    ie I want to understand what coordinates stored in the file? NO coordinates x, y, z (Joints), there is more to the coordinates that appear there, or are the same as for joints?

    I'm developing a simple and easy project where I have this knowledge helps a lot, then, what should I consider for recognition?, ie
    In my project I add the class
    DtwGestureRecognizer.cs, and then what..  

    in your project recorded and made some gesture recognition, created a text file with the date of that day and that day it worked perfectly, but when I do it now the program does not recognize, then then
    I can rename the file to be unique and can always do the recognition?

    then, for my project, I just need that file (txt) that stores the coordinates of the gestures and always be recognized, I just need it to do what should I do?

     

    tnhks a lot and sorry for so much questions...


    Migue(MakitoMakito)
    Monday, September 05, 2011 8:11 PM
  • If you check the links that Rhemyst suggest above you have a good starting point.

    We made some test changing the code and you can explore different optimizations depending from your scope.

    It's really cool and interesting approach

    Ugo


    Ugo
    Thursday, September 08, 2011 2:36 PM
  • @ Rymixxxx

    I have two questions about the project.

    1)Around line 275 in DtwGestureRecognizer.cs you use this piece of code to calculate the best match.

    // Find best between seq2 and an ending (postfix) of seq1.
    double bestMatch = double.PositiveInfinity;
    for (int i = 1; i < (seq1R.Count + 1) - _minimumLength; i++)
    {
         if (tab[i, seq2R.Count] < bestMatch)
         {
              bestMatch = tab[i, seq2R.Count];
         }

    }

    Why exactly do you use (seq1R.Count + 1) - _minimumLength in the for statement?

    2)Why do you normalize the coordinates with the distance between the shoulders?

    Thanks in advance and thanks a lot for the nice project you've posted, it's very usefull!

    • Edited by KenisB Tuesday, September 20, 2011 9:21 AM
    Tuesday, September 20, 2011 9:07 AM
  • @ Rymixxxx

    I have two questions about the project.

    1)Around line 275 in DtwGestureRecognizer.cs you use this piece of code to calculate the best match.

    // Find best between seq2 and an ending (postfix) of seq1.
    double bestMatch = double.PositiveInfinity;
    for (int i = 1; i < (seq1R.Count + 1) - _minimumLength; i++)
    {
         if (tab[i, seq2R.Count] < bestMatch)
         {
              bestMatch = tab[i, seq2R.Count];
         }

    }

    Why exactly do you use (seq1R.Count + 1) - _minimumLength in the for statement?

    2)Why do you normalize the coordinates with the distance between the shoulders?

    Thanks in advance and thanks a lot for the nice project you've posted, it's very usefull!

    1 ) Gotta say, I can't really recall... that piece of code was just a first draft I've posted just to let other take a look at a DTW approach.

    2 ) The raw joints coordinates are centrered on the kinect. You need to center it on the shoulders otherwise the recognition would depend on where your stand in regards to the Kinect. I also decided to normalize it with the distance between the shoulder to have a recognition that doesn't depend on users' height. I'm not sure today wether this was needed or not.

    Tuesday, September 20, 2011 11:16 AM
  • Hello,

    I've been testing the code and doing well.

    Now I want to record a gesture, and if I do that gesture anywhere on the screen (if I record the gesture at the top of the screen, you can do at the bottom) also recognizes it.
    Tuesday, November 29, 2011 11:11 AM
  • If you are interested in a complete and working program (including source code and documentation, including a description on how DTW works and need to be setup) have a look at the Kinetic Space project at http://code.google.com/p/kineticspace

    To see a video what you can do, see: http://www.youtube.com/watch?v=e0c2B3PBvRw

     

    The Kinetic Space provides a tool which allows everybody to record and recognize customized gestures using depth images as provided by PrimeSense PS1080, the Kinect or the Xtion sensors. The software observes and comprehends the user interaction by processing the skeleton of the user. The unique analysis routines allow to not only detect simple gestures such as pushing, clicking, forming a circle or waving, but also to recognize more complicated gestures as, for instance, used in dance performances or sign language.

    Five highlights of the software are that the gestures:

    • can be easily trained: the user can simply train the system by recording the movement/gesture to be detected without having to write a single line of code
    • are person independent: the system can be trained by one person and used by others
    • are orientation independent: the system can recognize gestures even if the trained and tested gesture does not have the same orientations
    • are speed independent: the system is able to recognize the gesture also if it is performed faster or slower compared to the training and is able to provide this information
    • can be adjusted: the system and gesture configuration can be setup by a XML file

    The software has already been used by media artists, dancers and alike to connect and to control a wide range of third party applications/software such as Max/MSP, Pure Data, VVVV, Resolume, etc. via the OSC protocol. The software is written in Processing and based on SimpleOpenNI, OpenNI and NITE.

    Monday, January 09, 2012 12:11 PM
  • How exactly can you modify the code to use the cursor's position rather than limb positions?

    I took out the 6 joint position recorders and replaced it with the cursor location, but how can I modify the DTW equation and the gesture text file to recognize a gesture?

    Monday, January 23, 2012 10:46 PM
  • I know no one has posted here in awhile so i thought i would OP can you please explain to me how to record and compare two gestures? For instance, i want to record my own gesture to file thats a sign language word and then save another persons gesture to a file and compare actual skeleton data is that possible? I haven't seen anything remotely like that yet unless I need to set 32fps down to something lower to capture actual skeleton data to file.

    Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth. - "Sherlock holmes" "speak softly and carry a big stick" - theodore roosevelt. Fear leads to anger, anger leads to hate, hate leads to suffering - Yoda

    Thursday, February 23, 2012 7:44 PM
  • Hi ,this is a great application,fantastic!!!HELP!!i need help, i have to recognize 2 type of gestures: hand that open on the side,to move a block in a Tetris game, and rotation with hand,to rotate the block. I can't find good values for thresholds to recognize these gestures....any suggestions??Please.Thanks ;).
    Saturday, March 31, 2012 7:06 AM
  • I think author has abandoned his dtw project so im going with kinecttoolbox on codeplex.

    Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth. - "Sherlock holmes" "speak softly and carry a big stick" - theodore roosevelt. Fear leads to anger, anger leads to hate, hate leads to suffering - Yoda

    Tuesday, April 03, 2012 4:58 PM
  • Hello sir, I need help on your program. Can you explain what does the _video array do and how does this affect the program? I'm basing my project on your program thank you :)
    Friday, September 14, 2012 9:40 AM
  • Hello, 

    Im not sure if anyone is going to answer me on this old thread, but here I go... 

    As i understand, you take an observation  Ai={left_wrist, right_wrist,left_elbow,right_elbow, left_hand, right_hand} where i.e. left_wrist={left_wrist.X, left_wrist.Y} ...

    and then you store a set of observations as a sequence Seq_A={A1,A2,A3...} and another set of observations as Seq_B={B1,B2,....}

    When you are calculating the distances between both sequences to fill the matrix,  how are you doing it?

    Option A: Dist[i,j])=sqrt{ (left_wrist.xA-left_wrist.xB)^2+(left_wrist.yA-left_wrist.yB)^2+(right_wrist.xA-right_wrist.xB)^2+(right_wrist.yA-right_wrist.yB)^2+...+(right_hand.xA-right_hand.xB)^2+(right_hand.yA-right_hand.yB)^2}

    Option B:

    Dist[i,j]= sqrt{ (left_wrist.xA-left_wrist.xB)^2+(left_wrist.yA-left_wrist.yB)^2}

    Dist[i,j+1]=sqrt{(right_wrist.xA-right_wrist.xB)^2+(right_wrist.yA-right_wrist.yB)^2}

    ...

    Dist[i,j+5]=sqrt{(right_hand.xA-right_hand.xB)^2+(right_hand.yA-right_hand.yB)^2}

    I wish to understand if, for each matrix component [i,j]:

    1- You calculate de distance between the 12-dimensional observations A and B

    2- You take the first seqA and seqB component (i.e left_wrist at frame 0) and calculate the euclidean distance for that component, then in the second element of the matrix you calculate the distance between the first seqA component and the second seqB component (i.e d( left_wristA,left_elbowB))...

    I think that you calculate, for each Matrix[i,j], the euclidean distance between both observations:

    Dist[i,j]=sqrt( (seq_A[0]-seq_B[0])^2 , (seq_A[1]-seq_b[1])^2, ... (seq_A[11]-seq_B[11])^2)  where [0],[2]... are the joints.X and [1],[3],... the joints.Y.

    Im right?

    I hope you can understand me. Thanks.


    Friday, October 26, 2012 1:13 PM
  • i faced some problem with running your project .

    i don't  know if the problem on my visual studio . but it always tell me that there is error of this "using Microsoft.Research.Kinect.Nui;"

    exactly on word research 

    Friday, November 16, 2012 12:52 PM