Hand tracking details in new github UWP skeletal tracking sample? RRS feed

  • Question

  • My app uses hand pose tracking. I want to port it to UWP so I was happily looking at the new UWP GitHub sample code. But unfortunately I see this in MFPoseTrackingFrame.h:

    //      for hand tracking: enumerated type is not yet defined.

    That is... too bad :-(  Is this an enumeration that was defined in the V2 SDK?  Or is this an enumeration that is not yet specified anywhere, and which we need Microsoft to publish?

    I will experiment with this, but some clarity would be great.  Thanks.

    Rob Jellinghaus

    Monday, November 7, 2016 5:40 PM

All replies

  • I guess by hand tracking pose, you mean whether lasso,open,closed or idle. Right?

    Not sure but I think you need to check this sample.

    It's gonna be tricky because they seem to have renamed the Joint to TrackedPose, so it will be confusing correlating things.

    On the other hand it seems they have accounted for you getting anything as it used to be by grabbing the frame as a byte buffer(as I understand it at least). So you could try BufferMediaFrame and use it to grab the BodyFrame from the SDK. Here they say that it can be done. If you reinterpret the buffer and grab the part of the frame that used to be the hand tracking pose....perhaps...

    Tuesday, November 8, 2016 8:56 AM
  • Data layout is definitely not matching the BodyFrame / Body classes present in the Kinect 2 SDK. e.g. now data which represent Joint & JointOrientation have their own TrackingStates and are present as a single block (struct) in the buffer (see TrackedPose struct)
    According to the header file MFPoseTrackingFrame.h There should be two different PoseSets (one for body tracking and another for Hand Tracking - distinct by different pre-defined Guid in "PoseTrackingEntityData" block.

    Currently (Driver from 17.10.2016 ver. 2.2.1610.17001), Custom Perception Frame contains only 6 "PoseTrackingEntityData" blocks - all with PoseSet_BodyTracking Guid. No PoseTrackingEntityData blocks with PoseSet_HandTracking are present.
    Following the data format description found in mentioned header file:

      //   3. PoseTrackingEntityData #0: First entity data. Common structure for all providers.
        //      1. DataSizeInBytes: Size of entire entity (PoseTrackingEntityData + custom data) in bytes (4 bytes).
        //      2. EntityId: Allows correlation between frames (16 bytes).
        //      3. PoseSet: Guids for Body (such as Kinect), Handtracking, etc.: defines the meaning of each Pose (16 bytes).
        //      4. PosesCount: Count of poses in this entity data (4 bytes).
        //      5. IsTracked: Whether or not this entity is being tracked (1 byte).
        //      6. Poses: Array of common structure TrackedPose (sizeof(TrackedPose) * PosesCount bytes).
        //      7. Customer specific data for this entity (DataSizeInBytes - sizeof(PoseTrackedEntityData) - sizeof(TrackedPose) * (PosesCount -1) bytes)
        //   4. Provider-specific data for entity #0 in this frame.

    There is a space for "Customer specific data for this entity" and "Provider-specific data for entity" - I jumped on investigation on the content of the buffer.
    Offsets (read from PoseTrackingFrameHeader) between PoseTrackingEntityData blocks are 1012B
    PoseTrackingEntityData reports size of 1012B > no space for "Provider-specific data for entity"
    "BodyTracking" data (25 joints + overhead) are in the initial 941B

    Voila - we have our "Customer specific data for this entity"
    Time to cut the story short: after reading body joints data skip 35 Bytes into "Customer specific data"
    here you have:
    LeftHandState (4B, see HandState enum from Kinect 2 SDK)
    LeftHandConfidence (4B, see TrackingConfidence enum in the Kinect 2 SDK)
    RightHandState (4B)
    RightHandConfidence (4B)

    • Proposed as answer by Jan Marcincin Friday, November 11, 2016 12:39 AM
    • Edited by Jan Marcincin Friday, November 11, 2016 12:56 AM
    Friday, November 11, 2016 12:35 AM
  • Great job man.

    Some observations and questions

    First off, data layout doesn't match old Body class because they are trying to abstract away an API that can work with a multitude of cameras. Same way they abstracted away Color and IR etc streams so that you can use the same API whether there's a Kinect or a Realsense as the device.

    Also ,going by the same , different PoseSets for Body and Hand are required because Kinect has a very minimalistic hand tracking module where LeapMotion or Realsense might have a high definition hand tracking module. To be honest I expected another one for Face but I think it might be somewhere else, like Media Capture API since the data might be similar to what phones use for face detection etc.

    I wonder what's the difference between "Provider-specific data for entity" and "Customer specific data for this entity". Both sound like per-frame metadata but provider seems like those coming from the device manufacturer,whereas Customer is like something the users could attach(kinda like those metadata you can attach in Kinect Studio in each frame of a recording...). Titles are confusing.

    By the way, you mention that BodyTracking data is in the initial 941 B but you are assuming 25 joints + overhead. Kinect doesn't have the complete skeleton structure in each frame but instead just the ones it thinks it should have along with a joint_counter prior to the joint position buffer that says, in this frame we have 23 joints, so read the rest of the buffer expecting 23 blocks of joint data blocks. Shouldn't that be part of the process? I'm guessing you had the full 25 because the data set had a fully visible/tracked skeleton.

    Friday, November 11, 2016 8:09 AM
  • I know that the idea was to abstract the interface to allow multiple devices of this kind to use it - that's what we heard from the moment of "hackjob" of gluing Kinect output to Perception Namespace a year ago. Nice idea indeed - It just isn't implemented that way for anything other than "bitmaps".

    My observation is, that Kinect 2 SDK BodyFrame and current custom Perception frame have always 6 body entities - even if only 1 person (or no one in case of UWP) is being tracked. Instead, IsTracked flag is used to define if given Body (Skeleton) is tracked. This could, in theory, be optimized by reporting different EntityCount in the header of the perception buffer.

    PosesCount (Joints count) inside of PoseTrackingEntityData with PoseSet_BodyTracking must always be 25 or more (in theory, some "undef" joint types could be used for 26+ Joint) in current buffer data format implementation because BodyPart (JointType in SDK) of "TrackedPose" (Joint) is assumed (it's not part of TrackedPose block / struct) based on its index in Poses (last part of PoseTrackingEntityData).

    Friday, November 11, 2016 9:56 AM