locked
Kinect depth Angle RRS feed

  • Question

  • Hello all!

     

    I am making  a kinect interface to interact with a multi-tile display by using the skeleton tracking provided by the SDK.

    Due to the fact there are many displays, I cannot have the kinect placed in a comfortable position in front of the user as this would obstruct the view.

    Currently, I have the kinect sensor below the bottom screen, angled up as much as possible.

    However, the depth stream is now skewed since the Kinect:s coordinate system isn`t alligned with the user`s.

    This is causing some very funky instability and inaccuracy, does anyone have have recommendations on getting more accurate data?

    One way I was thinking of resolving this was to rotate the skeleton joints along the kinect`s x axis, but I am not sure if this is ok or possible.

    Thanks!


    Thursday, August 18, 2011 2:20 PM

Answers

  • UnsoundOfMind,

    What you need to do is a coordinate transform from the cartesian space defined by the basis vectors of the Kinect's point of view (let's call them KV) into the cartesian space defined by the desired basis vectors (let's call these DV).

    When camera is not tilted at all, KV and DV are exactly the same so, since this is the desired vector space, for simplicity we can use the standard unit vectors to represent the axes:

    x: [1, 0, 0]
    y: [0, 1, 0]
    z: [0, 0, 1]

    Now, when you tilt camera upwards by an angle A, x axis stays the same but y-z plane rotates by A (i.e.: it corresponds exactly to a counter-clockwise rotation about X axis), so the basis vectors for KV (in terms of the basis vectors for DV) are now

    x: [1,       0,       0]
    y: [0, cos A, -sin A]
    z: [0,  sin A, cos A]

    to convert coordinates relative to KV into coordinates relative to DV, you have to perform a matrix multiplication between the transformation matrix defined by these basis vectors for KV (http://en.wikipedia.org/wiki/Transformation_matrix) and a joint position vector that you receive from kinect API. This will result in a joint position vector relative to DV.

    Does that make sense?
    Eddy

     

     


    I'm here to help
    Friday, August 19, 2011 6:33 PM

All replies

  • Regarding the inaccuracy, do you mean that skeleton tracking produces weird results, e.g.: in sample applications such as C:\Users\Public\Documents\Microsoft Research KinectSDK Samples\NUI\SkeletalViewer? Also, results are more accurate when you place kinect sensor temporarily in another place that is less convenient for you in the longer term?

    Also, are your legs fully visible when you point kinect sensor up in the way you mention? Kinect SDK Beta only supports full body tracking and legs have to be visible or else results will be inaccurate.

    Have you tried kinect mounts such as the TV/Computer screen mounting clip available from http://www.pdp.com/M-8-xbox-360.aspx? There are also wall mounts and floor stands available.

    You can do whatever coordinate transform you want on the skeleton joints after you receive them, but Kinect SDK Beta does not have a method to perform a coordinate transform before converting depth data into skeleton data.

    Hope this helps,
    Eddy


    I'm here to help
    Thursday, August 18, 2011 10:00 PM
  • The skeleton tracking does produce weird results due to the angled tilt of the kinect.

    I am using the skeleton tracker to retrieve the joint (x,y,z) points in my program.

    When the kinect is placed in a more, flat, favorable position the application is fine, its when it is placed on the ground with an upward tilt does the data become skewed.

    The legs are also fully visible, as the Kinect SDK Beta only supports full body tracking.

    Since the kinect is angled up, it sees the user standing, leaning backwards and it makes the gesture recognition a little awkward and inaccurate.

    That mounting clip looks awesome but unfortunately theres not enough clearance behind the screen to fit it in.

    Would you have any tips or suggestions on sort of, realligning the skeleton data with the real world so that the data is not skewed?

    All i can think of so far is to rotate the skeleton joints after i recieve them, but im not sure which axis to rotate it around by or if the depth data provided is to the same scale as the x/y data.

    Thanks,

    Matt

    Friday, August 19, 2011 5:24 AM
  • Hi UnsoundOfMind532,

     

    Your problem is intrinsic to the angle - the relative position according to the Kinect is of course that you are leaning back, as your head's depth position is much greater than the feet.

     

    There are a couple of workarounds that I can see:

    1) take the average of the head.Z and feet.Z, and use that for every Z co-ordinate, as it will provide a level plane for all joint recognition

    2) Instead of using the X and Y planes to detect gestures, you could use a combination of the X, Y and normalised Z plane in 1) and calculate the absolute magnitudes of the movement vectors - a lot more awkward and only 'simple' if you have a good grasp of 3D vector mathematics

    3) As you know the angle of the kinect (which I will assume [and hope] is level), and the distance of each joint from the kinect, you can work out the horizontal distance from the kinect to the user using the co-ordinate closes to .Y = 0 (i.e. the centre of the camera's vision, which is equal to camera angle. 

     

    Solutions:

    A) Because of the limited range of the angles that the Kinect can be tilted, equating all of the Z positions to the normalised plane might then solve your problems, because I wouldn't have thought that much X and Y movement was lost.

    B) The other option is to use the normalised plane and the Z plane that the camera detects (which can be calculated by using the line that the Z-co-ordinates of the user's body creates), which then gives you the angle of rotation between the normalised plane and the camera's detected plane using simple trigonometry (which is in the Y and Z axes). This can then be used to transform each of your joints by rotating them in the Y and Z planes according to the magnitude of their distance from the rotational point, which I would assume would be 0 in the Y plane

     

    hope that helps and makes sense!

    Friday, August 19, 2011 11:14 AM
  • Ops I should of clarified that I need all 3 X,Y,Z, as some gestures are dependent on the depth so I cannot use solution A.

     

    However, I am a bit confused on your solution B haha.

     

    What is the normalized plane?

    And by this Z Plane, do you mean the plane that contains the line of Kinect's origin to a joint, for the detecting the angle between the 2 planes.
    Could you explain this a bit more, I'm a bit confused and noobish.


    Here is a picture i found on the internet that describes my problem.
     http://bb-attachments.cycling74.com.s3.amazonaws.com/2001.KinectThresholdexplanation.jpg

    The blue line is what I would like to have the Z values quantified as from the kinect, not the green line which is how it is perceived when tilted.

    I have certain gestures that rely on X,Y,Z coordinates, so when the camera is titled, I lose accuracy on the Y and Z due to the skewed view.


     Thanks!

    Friday, August 19, 2011 11:57 AM
  • What I mean by the normalised plane is that if you take the average of the head and feet Z-co-ordinates, and equate all Z co-ordinates to that, then you have a single plane on which all joints exist, but if your applications are depth-dependant then it's not ideal!

    By the Z-plane, I mean the 3D equivalent of the lines on the diagram - all points on the plane are given as the same distance away from the kinect.

    As far as I understand the working of the kinect, I think that the diagram is wrong - the blue line is already percieved by the Kinect, as the Z-distance is not consistent, which is what you have on your program - the user is leaning back; the green line is what you want, where the user is perceived as being orthogonal to the Kinect, and all Z co-ordinates are on a common plane according to the Kinect.

     

    Solution B in my first post describes the picture - re-read it, but according to the diagram (and my previous comment about the error in the diagram) where the normalised plane is the green line, and the plane detected by the kinect is the blue line. The angle can be calculated and the transform is fairly straightforward

    Friday, August 19, 2011 12:16 PM
  • UnsoundOfMind,

    What you need to do is a coordinate transform from the cartesian space defined by the basis vectors of the Kinect's point of view (let's call them KV) into the cartesian space defined by the desired basis vectors (let's call these DV).

    When camera is not tilted at all, KV and DV are exactly the same so, since this is the desired vector space, for simplicity we can use the standard unit vectors to represent the axes:

    x: [1, 0, 0]
    y: [0, 1, 0]
    z: [0, 0, 1]

    Now, when you tilt camera upwards by an angle A, x axis stays the same but y-z plane rotates by A (i.e.: it corresponds exactly to a counter-clockwise rotation about X axis), so the basis vectors for KV (in terms of the basis vectors for DV) are now

    x: [1,       0,       0]
    y: [0, cos A, -sin A]
    z: [0,  sin A, cos A]

    to convert coordinates relative to KV into coordinates relative to DV, you have to perform a matrix multiplication between the transformation matrix defined by these basis vectors for KV (http://en.wikipedia.org/wiki/Transformation_matrix) and a joint position vector that you receive from kinect API. This will result in a joint position vector relative to DV.

    Does that make sense?
    Eddy

     

     


    I'm here to help
    Friday, August 19, 2011 6:33 PM
  • Alternatively, using an approach very similar to what Eddy gave, you could place the Kinect device to the side of the monitors at the correct height (so you don't have to tilt it), angled maybe 15 or 20 degrees — whatever works — and then rotate the skeleton in world-space about its vertical axis to compensate.  Too much of an angle may occlude one of the user's arms, but if the user's arms are mostly in front of him/her, it shouldn't matter.
    Friday, September 23, 2011 4:27 PM