none
Using Depth Sensor - how do I get X/Y Co-ordinates? RRS feed

  • Question

  • Hey all,

    So I'm planning a project where I use the depth sensor only (640x480), to show the outline of a shape, but I want to get its width or length at several points.

    How would I go about this? Do I have to figure out the width of the field of view of the camera at certain depths and calculate from the pixels that it inhabits, or is there actual ways of doing these measurements?

    Thanks

    Wednesday, June 22, 2011 3:00 AM

Answers

  • So, it turns out this is a case of believing the documentation - 

                vp1 = nui.SkeletonEngine.DepthImageToSkeleton((float)(vx1 / 320), (float)(vy1 / 240), (short)((short)vz1 << 3));
                vp2 = nui.SkeletonEngine.DepthImageToSkeleton((float)(vx2 / 320), (float)(vy2 / 240), (short)((short)vz2 << 3));
    Got me two points - I checked vp2.X - vp1.X and got back a result in metres.
    (vp2.X - vp1.X) * 100 = cm
    (vp2.X - vp1.X) * 1000 = mm
    And it was pretty accurate... measuring a box fairly front on went between almost exact and 10mm out - but that was caused by bleeding pixels at the edge of the box, which is understandable (I was letting the computer self-measure).
    Perfect, sorry to have not read properly to begin with!
    Note: This assumes you z-axis values are the depth in mm, having used the conversion from the camera - int realDepth = (depthFrame16[i16 + 1] << 5) | (depthFrame16[i16] >> 3);
    If you are using that conversion (DepthAndPlayerIndex) then you could just pass the value of the pixel straight in, however, if you're using Depth only then you'll need to convert and then convert back to bit shift left 3 (<< 3).

    Wednesday, June 29, 2011 11:24 AM
  • Crucible, you're right! I'm sorry I missed that point, even though it's on the docs. That explains lots of problems that people have been having with DepthImageToSkeleton (and C++ equivalent, NuiTransformDepthImageToSkeletonF) method.

    I just confirmed that the first thing that NuiTransformDepthImageToSkeletonF does with depth value passed in is right shift it by 3, before performing any calculations, so yes, it will work fine when using data directly from depth + player index stream, but will not be accurate when passing data from depth-only stream unless you first left shift it by 3. Same applies if you obtained "realDepth = (depthFrame16[i16 + 1] << 5) | (depthFrame16[i16] >> 3)" before passing depth from depth+player index stream.

    This does look like a big source of confusion, so I'll add it to FAQ and report it so we consider making API clearer for next release.

    Sorry again for all your wasted time!
    Eddy


    I'm here to help
    Wednesday, June 29, 2011 10:26 PM

All replies

  • crucible,

    Have you looked at the SkeletonEngine.DepthImageToSkeleton method? it is meant to map from "depth image space" (pixel coordinates + depth value in depth image) to "skeleton coordinate space" described in page 22 of programming guide (http://research.microsoft.com/en-us/um/redmond/projects/kinectsdk/docs/ProgrammingGuide_KinectSDK.pdf). Note that in skeleton coordinate space, dimensions are given in meters from origin at camera location.

    Hope this helps,
    Eddy


    I'm here to help
    • Proposed as answer by Eddy Escardo-Raffo [MSFT] Wednesday, June 22, 2011 7:33 AM
    • Unproposed as answer by andrewtobin Tuesday, June 28, 2011 9:52 AM
    • Proposed as answer by walkerzz Friday, August 24, 2012 12:05 PM
    • Unproposed as answer by walkerzz Friday, August 24, 2012 12:05 PM
    Wednesday, June 22, 2011 7:33 AM
  • Hey Eddy,

    I run into problems when I try to get the pixel coordinates and depth value. In order to get both, dont we need both imageframe and skeletal frame? What is the proper way of handling this?

    Also, how exactly would you get pixel coordinates and depth value?

    Is the depth value (firstFrame >> 3 | secondFrame << 5) or just (firstFrame >> 3) ? 

    Thanks,

    Gilbert

    Wednesday, June 22, 2011 9:06 PM
  • Thanks for that Eddy, but just to confirm for myself - I can use the DepthImageToSkeleton method to find the X,Y,Z co-ordinates of any pixel from my Depth image and I am not required to have a valid skeleton in frame?

    What I want to do is scan objects and get their width/height so we can do some calculations based on materials passing the camera - so there will not be anything remotely like a skeleton present.

    What sort of "resolution" are the measurements at - I've seen some hacks that show pretty accurate drawings of rooms, this youtube video (http://www.youtube.com/watch?v=7TGF30-5KuQ&feature=related) where the guy can measure to a half cm of a box, would this be capable with this sort of approach?

    Thanks

    Wednesday, June 22, 2011 11:01 PM
  • Hey Eddy,

     

    I'm sorry, I'm going to make a pain out of myself I'm betting ;)

    So I had a look at nui.SkeletonEngine.DepthImageToSkeleton... but I have a few questions.

    What I've done is, mapped an Image (in WPF) to the same size as the camera stream 320x240 (with PlayerData, as in the SkeletalImage demo).

    I've grabbed the button click event on the Image, and got the pixel co-ordinates of the button click.

    From there, I want to grab that pixel and get its X,Y,Z co-ords in real space.

    Am I after something like:

                var s = nui.SkeletonEngine.DepthImageToSkeleton((float)pos.X, (float)pos.Y, (short)realDepth);

    Where X is the pixels on X-axis, Y pixels on Y-Axis, and realDepth is like the example - the results of int realDepth = (_image.Bits[pixel + 1] << 5) | (_image.Bits[pixel] >> 3);

    For starters, obviously Depth is an Int here and not a short, so I'm not 100% clear if that's the right type to be using or the right value for depth, given the examples clearly call it an int, and yet the method requires a short...

    And I'm not sure whether to use x and y pixel co-ords, because the vectors I'm getting back are like 21, -13, 0.427 - which if these are in metres, then only the distance seems plausable.

    Do I have to turn the pixel co-ords into a float from 0 - 1 like the SkeletonToDepthImage method seems to return?

    Thanks

    Thursday, June 23, 2011 11:19 AM
  • So, at the moment I'm doing:

    var vecpoint1 = nui.SkeletonEngine.DepthImageToSkeleton((float)(pos.X / 320), (float)(pos.Y / 240), (short)realDepth);

    var vecpoint2 = nui.SkeletonEngine.DepthImageToSkeleton((float)(pos2.X / 320), (float)(pos2.Y / 240), (short)realDepth2);

    width = vecpoint2.X - vecpoint1.X < 0 ? -(vecpoint2.X - vecpoint1.X) : (vecpoint2.X - vecpoint1.X);

    width = width*1000; // convert to cm

    I'm getting around 10cm off on a 50cm measurement, and 3cm off on a 20cm measurement.

    Is there anything I'm getting obviously wrong with this?

    Thursday, June 23, 2011 11:23 PM
  •  

    Hi Crucible, 

    I played around with the int and short conversions and found I received the same response when realDepth was cast to either in the C# skeletonTracker sample app.   I suspect it makes no difference. 

    I'm finding I'm off by about 20 cm in both width and height  of an object measuring roughly 60 cm wide x 70 cm tall located in the center of the view at 50 cm depth when I scale my x and y positions between 0 and 1.   Either this is a base amount of error, or we're not scaling our x and y coordinates correctly or maybe something else.  

    Can anyone give feedback on the proper way to format the xDepth, yDepth inputs for the DepthImageToSkeleton function? 

    Friday, June 24, 2011 2:42 PM
  • Thanks Susan, glad it's not me.

    Anyone else getting this scaling issue or know how to resolve?

    Monday, June 27, 2011 1:15 PM
  • Okay, I might be onto something.. from the SDK chm help file on SkeletonEngine.DepthImageToSkeleton

    Parameters

    depthX

    The X coordinate of the depth pixel, as a floating point number between 0.0 and 1.0.

    depthY

    The Y coordinate of the depth pixel, as a floating point number between 0.0 and 1.0.

    depthValue

    The depth value of the depth image pixel, in millimeters, shifted left by 3. The left-shift enables you to pass the value from the depth image directly into this function.

    So, basically it's expecting the raw unshifted value from the 320x240 DepthImageWithPlayerIndex passed in as the Depth - where I was trying to pass it the actual depth.  Chances are this would drastically change the perceived X/Y co-ordinates as well, if the scaling is wrong.

    It also means that if you were using it without the player index, and/or 640x480 you'd have to change the bit-shifting on the depth value pixels to accommodate - seems to be the wrong way to go about it, given they've publicised how to get the real depth, but what the heck.

     

     

    Wednesday, June 29, 2011 12:14 AM
  • So, it turns out this is a case of believing the documentation - 

                vp1 = nui.SkeletonEngine.DepthImageToSkeleton((float)(vx1 / 320), (float)(vy1 / 240), (short)((short)vz1 << 3));
                vp2 = nui.SkeletonEngine.DepthImageToSkeleton((float)(vx2 / 320), (float)(vy2 / 240), (short)((short)vz2 << 3));
    Got me two points - I checked vp2.X - vp1.X and got back a result in metres.
    (vp2.X - vp1.X) * 100 = cm
    (vp2.X - vp1.X) * 1000 = mm
    And it was pretty accurate... measuring a box fairly front on went between almost exact and 10mm out - but that was caused by bleeding pixels at the edge of the box, which is understandable (I was letting the computer self-measure).
    Perfect, sorry to have not read properly to begin with!
    Note: This assumes you z-axis values are the depth in mm, having used the conversion from the camera - int realDepth = (depthFrame16[i16 + 1] << 5) | (depthFrame16[i16] >> 3);
    If you are using that conversion (DepthAndPlayerIndex) then you could just pass the value of the pixel straight in, however, if you're using Depth only then you'll need to convert and then convert back to bit shift left 3 (<< 3).

    Wednesday, June 29, 2011 11:24 AM
  • Gilbert,

    You only need depth stream to use SkeletonEngine.DepthImageToSkeleton.

    The depth value is calculated as described in question 8 in the FAQ (http://social.msdn.microsoft.com/Forums/en-US/kinectsdknuiapi/thread/4da8c75e-9aad-4dc3-bd83-d77ab4cd2f82). It depends on whether you're using depth stream or depth plus player index stream.

    Then the normalized depth pixel coordinates (between 0 and 1) are obtained by dividing integer depth pixel coordinates by the dimensions of depth image. E.g.: for a 640x480 image, the normalized coordinates for pixel (450,300) would be (450.0/640,300.0/480), which is (0.703125,0.625).

    Hope this helps,
    Eddy


    I'm here to help
    Wednesday, June 29, 2011 10:09 PM
  • Crucible, you're right! I'm sorry I missed that point, even though it's on the docs. That explains lots of problems that people have been having with DepthImageToSkeleton (and C++ equivalent, NuiTransformDepthImageToSkeletonF) method.

    I just confirmed that the first thing that NuiTransformDepthImageToSkeletonF does with depth value passed in is right shift it by 3, before performing any calculations, so yes, it will work fine when using data directly from depth + player index stream, but will not be accurate when passing data from depth-only stream unless you first left shift it by 3. Same applies if you obtained "realDepth = (depthFrame16[i16 + 1] << 5) | (depthFrame16[i16] >> 3)" before passing depth from depth+player index stream.

    This does look like a big source of confusion, so I'll add it to FAQ and report it so we consider making API clearer for next release.

    Sorry again for all your wasted time!
    Eddy


    I'm here to help
    Wednesday, June 29, 2011 10:26 PM
  •  

    Thanks for tracking that down Crucible - nice work!!  This improved my 3D results from 20 cm to a few mm.  


    Susan
    Thursday, June 30, 2011 1:52 PM
  • Hi Eddy,

    how can I use the depthmap to SkeletonFrame coordinate transformation without using skeletaltracking. Because i want to write a performant application, and i actually don't need the skeleton at all.

    When I call nui.Initialize(RuntimeOptions.UseDepth); without skeletaltracking i always get an error if i call nui.SkeletonEngine.DepthImageToSkeleton during runtime?

    Thanks Mathew

    Monday, January 30, 2012 2:46 PM
  • Hi Eddy,

    how can I use the depthmap to SkeletonFrame coordinate transformation without using skeletaltracking. Because i want to write a performant application, and i actually don't need the skeleton at all.

    When I call nui.Initialize(RuntimeOptions.UseDepth); without skeletaltracking i always get an error if i call nui.SkeletonEngine.DepthImageToSkeleton during runtime?

    Thanks Mathew

    You really didn't need to bump an old thread, I got really confused reading the replies above.

    Also, you'll probably find there's more than just Eddy on these forums replying ;)

     

    Have you tried using the calculations/functions found in the kinect SDK includes? i.e. C:\Program Files\Microsoft SDKs\Kinect\v1.0 Beta2\inc\MSR_NuiSkeleton.h You're probably not using C++ and using C# but you'll find the following lines may help;

    // Assuming a pixel resolution of 320x240
    // x_meters = (x_pixelcoord - 160) * NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 * z_meters;
    // y_meters = (y_pixelcoord - 120) * NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 * z_meters;
    #define NUI_CAMERA_DEPTH_IMAGE_TO_SKELETON_MULTIPLIER_320x240 (NUI_CAMERA_DEPTH_NOMINAL_INVERSE_FOCAL_LENGTH_IN_PIXELS)
     
    // Assuming a pixel resolution of 320x240
    // x_pixelcoord = (x_meters) * NUI_CAMERA_SKELETON_TO_DEPTH_IMAGE_MULTIPLIER_320x240 / z_meters + 160;
    // y_pixelcoord = (y_meters) * NUI_CAMERA_SKELETON_TO_DEPTH_IMAGE_MULTIPLIER_320x240 / z_meters + 120;
    #define NUI_CAMERA_SKELETON_TO_DEPTH_IMAGE_MULTIPLIER_320x240 (NUI_CAMERA_DEPTH_NOMINAL_FOCAL_LENGTH_IN_PIXELS)

    Hope that helps

    Tuesday, January 31, 2012 4:53 PM
  • Thanks for your reply, yes you're right I'm using C#.

    where can I access those constants defined in the nuiskelteon.h headerfile?

    Unfortunately I don't have the permissions to access the headerfile on this machine.

    regards mathew


    • Edited by kinecting Thursday, February 2, 2012 2:24 PM
    Thursday, February 2, 2012 2:18 PM
  • hey has anyone figured out how to do this for the new SDK? i have tried:

         int x;
                int y;
                int[,] depthMap = new int[depthFrame.Width, depthFrame.Height];
                DepthImagePoint dpthpt = new DepthImagePoint();
                for (y = 0; y < depthFrame.Height; y++)
                {
                    for (x = 0; x < depthFrame.Width; x++)
                    {
                        dpthpt.X = x;
                        dpthpt.Y = y;
                        depthMap[x,y] = dpthpt.Depth;
                    }

                } 

    but when i do this all my depths read as 0. i based it off the GenerateColoredBytes Method in the Beta SDK from 'working with depth data'. though i remember we used a height offset back then, does anyone know how to solve this? the example used in the new SDK generates the pixel data which has values from -8 to over 20000. How do we generate a (320*240) array of depth data for each coord?

    EDIT:

    figured it out, here's my solution, i just inserted this into the Generatecoloredbytes method. Depth data is stored in a 360*240 array called depthMap. FYI, the kinect seems to record certain depths that it cannot resolve as -1.

     

        int x;

                int y;

                int[,] depthMap = new int[depthFrame.Width, depthFrame.Height];

     

         //loop through all distances

                //pick a RGB color based on distance

                for (int depthIndex = 0, colorIndex = 0;

                    depthIndex < rawDepthData.Length && colorIndex < pixels.Length;

                    depthIndex++, colorIndex += 4)

                {

                    //gets the depth value

                    int depth = rawDepthData[depthIndex] >> DepthImageFrame.PlayerIndexBitmaskWidth;

     

    //map x,y coordinates

                    x = (depthIndex % depthFrame.Width);

                    y = ((int)depthIndex / depthFrame.Width);

                    depthMap[x, y] = depth;

                }

     

     

    • Edited by AleksMA Thursday, February 2, 2012 7:49 PM
    Thursday, February 2, 2012 3:40 PM
  • @AlexsMA:

    I'm not sure what you're trying to accomplish with this code. DepthImagePoint is a simple struct, a holder of values. It isn't tied to the depth frame, so you can't just set its X and Y properties and expect its Depth property to automatically change accordingly. The Depth property will continue to have its initial value of zero (as you've discovered).

    If you're just trying to convert the depth frame data into a 2-dimensional array of depth-only values in mm, I think this is the code you're after:

            private short[] _rawPixelData;
            private int[,] _depthMap;
            void OnDepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
            {
                using (DepthImageFrame f = e.OpenDepthImageFrame())
                {
                    if (f != null)
                    {
                        // Allocate raw pixel data array, if necessary
                        if ((this._rawPixelData == null) || (this._rawPixelData.Length != f.PixelDataLength))
                        {
                            this._rawPixelData = new short[f.PixelDataLength];
                        }
     
                        // Allocate depth map array, if necessary
                        if ((this._depthMap == null) ||
                            (this._depthMap.GetLength(0) != f.Width) ||
                            (this._depthMap.GetLength(1) != f.Height))
                        {
                            this._depthMap = new int[f.Width,f.Height];
                        }
     
                        // Obtain the raw pixel data for this frame and copy to depth map
                        f.CopyPixelDataTo(this._rawPixelData);
                        for (int x = 0; x < f.Width; x++)
                        {
                            for (int y = 0; y < f.Height; y++)
                            {
                                _depthMap[x, y] = _rawPixelData[(y*f.Width) + x] >> DepthImageFrame.PlayerIndexBitmaskWidth;
                            }
                        }
                    }
                }
            }
    

    Note that it's not a good idea to reallocate your array on every frame (30 times a second) as this will put undue wear-and-tear on the heap and garbage collector. Instead, allocate it only the first time, or whenever its size needs to change, as in the example above.

    The copy from one array to the other is also fairly expensive. If creating a separate _depthMap array could be avoided, I'd recommend doing so. Exactly what you replace it with really depends on what you're planning to do with the data. If you just need to sample a subset of the depth values, you could write a helper function that can access them directly from the raw pixel data:

            int GetDepth(int x, int y)
            {
                return _rawPixelData[(y*_frameWidth) + x] >> DepthImageFrame.PlayerIndexBitmaskWidth;
            }
    

    If you're planning to run a loop over the entire depth map, your best bet for performance would be to pin _rawPixelData in an unsafe pointer, loop over it, and use the values directly:

                        unsafe
                        {
                            fixed (short* start = _rawPixelData)
                            {
                                short* p = start;
                                short* end = p + f.PixelDataLength;
     
                                for (int x = 0, y = 0; p < end; p++, x++)
                                {
                                    if (x == f.Width)
                                    {
                                        x = 0;
                                        ++y;
                                    }
     
                                    int depth = *p >> DepthImageFrame.PlayerIndexBitmaskWidth;
     
                                    // use depth here
                                }
                            }
                        }
    

    Hope this helps.

    John
    K4W Dev

     

    Thursday, February 2, 2012 8:40 PM
  • Thanks John! I understand what you are getting at regarding the copying procedure, unfortunately i want to apply a depth filter (mincutoff<depth<maxcutoff)  so i have to read all the values to do so. i have however changed my second loop to record only the x,y positions within the depth filter, rather than applying it to the whole loop and then extracting the [x,y]'s from this:

     

     

               //get the raw data from kinect with the depth for every pixel
    
                short[] rawDepthData = new short[depthFrame.PixelDataLength];
    
                depthFrame.CopyPixelDataTo(rawDepthData);
    
    
    
                Byte[] pixels = new byte[depthFrame.Height * depthFrame.Width * 4];
    
    
                const int BlueIndex = 0;
    
                const int GreenIndex = 1;
    
                const int RedIndex = 2;
    
                int x;
    
                int y;
    
    
    
                int boundaryIndex = 0;
    
                int[,] depthMap = new int[depthFrame.Width, depthFrame.Height];
    
                int[,] depthboundary = new int[depthFrame.Width * depthFrame.Height, 2];
    
          
    
                //loop through all distances
    
                //pick a RGB color based on distance
    
                for (int depthIndex = 0, colorIndex = 0;
    
                    depthIndex < rawDepthData.Length && colorIndex < pixels.Length;
    
                    depthIndex++, colorIndex += 4)
    
                {
    
    
    
                    //gets the depth value
    
                    int depth = rawDepthData[depthIndex] >> DepthImageFrame.PlayerIndexBitmaskWidth;
    
                   
    
                    if (depth > 1000 && depth < 1500)
    
                    {
    
                        //we are in boundary
    
                        pixels[colorIndex + BlueIndex] = 0;
    
                        pixels[colorIndex + GreenIndex] = 255;
    
                        pixels[colorIndex + RedIndex] = 0;
    
                        //map x,y coordinates
    
                        x = (depthIndex % depthFrame.Width);
    
                        y = ((int)depthIndex / depthFrame.Width);
    
                        depthMap[x, y] = depth;
    
                        depthboundary[boundaryIndex, 0] = x;
    
                        depthboundary[boundaryIndex, 1] = y;
    
                        
                        boundaryIndex++;
    
                    }
    
                    else
    
                    {
    
                        //we are outside boundary, either too close or too far
    
                        pixels[colorIndex + BlueIndex] = 0;
    
                        pixels[colorIndex + GreenIndex] = 0;
    
                        pixels[colorIndex + RedIndex] = 0;
    
                    }
    
    
    
                }
    
    
    
                return pixels;
    
            }
    

     

     

    Now the reason i need the [x,y] data is because i want to use the depth map analysis in conjunction with color stream analysis. Basically, i want to use the points that are in the depth boundary to be a basis for selecting which pixels i analyse during the RGB analysis.

    Note that the reason for the depthboundary array is to reduce the amount of tasks the processor has to handle per loop (had i just used the depthMap i would have to check 320 x 240 fields during RGB analysis, whereas by storing the [x,y] values within depthboundary, i only have to check boundaryIndex  number of fields from the depthMap)

    Can i extract a similar 2 dimensional array [boundaryIndex,2] wherein each element of the array represents the R,G and B values, or do i have to make 3 separate arrays to store the R G and B values of each [x,y] point respectively. In other words, i simply want to store the RGB data associated with each point in the depth boundary.

    EDIT: Just realized colorFrame.CopyPixelDataTo() returns bgr values in extractable form, ^_^'




    • Edited by AleksMA Friday, February 3, 2012 11:01 PM Question Removed
    Friday, February 3, 2012 8:45 PM
  • for (int depthIndex = 0, colorIndex = 0;

                    depthIndex < rawDepthData.Length && colorIndex < pixels.Length;

                    depthIndex++, colorIndex += 4)

                {

                
                    int depth = rawDepthData[depthIndex] >> DepthImageFrame.PlayerIndexBitmaskWidth;

                  
                    if (depth > 1000 && depth < 1500)

                    {


                        x = (depthIndex % depthFrame.Width);

                        y = ((int)depthIndex / depthFrame.Width);

                        depthMap[x, y] = depth;

    I can't understand the differences between the two codes here,

    in my own code I use something similar to John's one :

     for (int x = 0; x < depthFrame.Width; x++)
                    {
                        for (int y = 0; y < depthFrame.Height; y++)
                        {
                            _depthMap[x, y] = _rawPixelData[(y * depthFrame.Width) + x] >> DepthImageFrame.PlayerIndexBitmaskWidth;

                            objPoint.X = (float)x / depthFrame.Width;
                            objPoint.Y = (float)y / depthFrame.Height;
                            objPoint.Z = _depthMap[x, y];

    while Alek uses

    for (int depthIndex = 0, colorIndex = 0;

                    depthIndex < rawDepthData.Length && colorIndex < pixels.Length;

                    depthIndex++, colorIndex += 4)

                {

                
                    int depth = rawDepthData[depthIndex] >> DepthImageFrame.PlayerIndexBitmaskWidth;

                  
                    if (depth > 1000 && depth < 1500)

                    {


                        x = (depthIndex % depthFrame.Width);

                        y = ((int)depthIndex / depthFrame.Width);

                        depthMap[x, y] = depth;

    do they do the same or not? what is the difference?? what's better?? can someone explain to me ?? thank you!!!!

     

    Tuesday, May 22, 2012 7:06 AM