# Depth Stream resolution question and depth-space to skeleton-space convertion problems

• ### Question

• Hi,

If I request a 640x480 depth stream, is the received depth-information the same as if I request a 320x240 stream and the higher resolution is achieved through interpolation or does the 640x480 depth stream contain more information?

I also realised, that in the code for NuiTransformDepthImageToSkeletonF, the SDK assumes that the dimension of the depth-frame is 320x240. So if I use a 640x480 depth-stream, is there a way for me to convert the depth-image pixels to 3D Coordinates?

Thursday, July 28, 2011 3:52 PM

• Nowheremansq,

you're randomizing this thread a fair amount. I would request that you please take a little time to read and understand a question before posting a reply, or ask clarifying questions if there is ambiguity. The question is not about depth+player index, but about plain depth stream.

kurayamiv, NuiTransformDepthImageToSkeletonF takes floating point values in 0.0 to 1.0 range, which are normalized from an image in any dimension, even 80x60. The fact that we use 320x240 based constants in implementation of NuiTransformDepthImageToSkeletonF is because of convenience, to only use a single multiplication factor definition.

Since the NUI_CAMERA_DEPTH_NOMINAL_FOCAL_LENGTH_IN_PIXELS constant is scaled (for convenience) to be based on a 320x240 pixel image, then math that uses that constant should also use 320x240 pixel dimensions to be able to cancel out pixel dimensions that appear in the numerator with pixel dimensions that appear in the denominator of fractions. If this constant instead was based on a 640x480 pixel image, then math that uses it should also use 640x480 pixel dimensions. At the end of the NuiTransformDepthImageToSkeletonF, the values returned are in meters, not in pixels, so the granularity of the pixel math used internally should be irrelevant. Unless maybe you've spotted a potential bug with our space transformations, in which case we welcome the feedback.

Does this make sense?
Eddy

I'm here to help
• Marked as answer by Friday, July 29, 2011 12:47 PM
• Unmarked as answer by Friday, July 29, 2011 12:47 PM
• Marked as answer by Friday, July 29, 2011 1:05 PM
Thursday, July 28, 2011 7:49 PM

### All replies

• As far as I know, the only way to make the depth frame to 3D Coordinates is using DX 3D or Wpf 3D with the x,y,z data in each joints.

This is an screenshot by me using Wpf 3D:

http://blog.csdn.net/nowheremansq/article/details/6642191

Thursday, July 28, 2011 4:04 PM
• I think you misunderstood me there.

The SDK offers the NuiTransformDepthImageToSkeletonF function, which accepts a x-, y-coordinate and the depth of the pixel (in mm shifted << 3), and returns a Vector4 which contains the Skeleton-space (3D) coordinates of this pixel. But as I mentioned this function assumes, that the depth frame (where the coordinates are from) has the dimension of 320x240. So I wanted to know if I there exists a way to get a Vector4 for coordinates which originated from a 640x480 depth image

Thursday, July 28, 2011 4:08 PM
• I'm Sooo sorry for the misunderstanding. Is this function only available for C++ ? As far as I know in C#, the depth frame & skeleton frame are separated. Developers do not have to make a convertion

Thursday, July 28, 2011 4:25 PM
• In C++ the frames are also seperated. The C# equivalent function of NuiTransformDepthImageToSkeletonF is
 SkeletonEngine.DepthImageToSkeleton
Thursday, July 28, 2011 4:33 PM
• Hello

I searched MS official "SkeletalViewer_Walkthrough.pdf" and found :

"Valid resolutions for depth and player index data are Resolution320x240 and Resolution80x60."

Is Resolution640x480 Valid for depth frame without player index ?

Thursday, July 28, 2011 4:56 PM
• Nowheremansq,

you're randomizing this thread a fair amount. I would request that you please take a little time to read and understand a question before posting a reply, or ask clarifying questions if there is ambiguity. The question is not about depth+player index, but about plain depth stream.

kurayamiv, NuiTransformDepthImageToSkeletonF takes floating point values in 0.0 to 1.0 range, which are normalized from an image in any dimension, even 80x60. The fact that we use 320x240 based constants in implementation of NuiTransformDepthImageToSkeletonF is because of convenience, to only use a single multiplication factor definition.

Since the NUI_CAMERA_DEPTH_NOMINAL_FOCAL_LENGTH_IN_PIXELS constant is scaled (for convenience) to be based on a 320x240 pixel image, then math that uses that constant should also use 320x240 pixel dimensions to be able to cancel out pixel dimensions that appear in the numerator with pixel dimensions that appear in the denominator of fractions. If this constant instead was based on a 640x480 pixel image, then math that uses it should also use 640x480 pixel dimensions. At the end of the NuiTransformDepthImageToSkeletonF, the values returned are in meters, not in pixels, so the granularity of the pixel math used internally should be irrelevant. Unless maybe you've spotted a potential bug with our space transformations, in which case we welcome the feedback.

Does this make sense?
Eddy

I'm here to help
• Marked as answer by Friday, July 29, 2011 12:47 PM
• Unmarked as answer by Friday, July 29, 2011 12:47 PM
• Marked as answer by Friday, July 29, 2011 1:05 PM
Thursday, July 28, 2011 7:49 PM
• The 640x480 has more information. At 640x480 I get about 345 distinct depth values for a given pixel. With 320x240 I only get about 256. Watching the isochronous stream for the Kinect camera while streaming only depth the rate seems the same whichever you request, about 10 million bytes per second. That's right around what an uncompression packed 11-bit stream at 640x480 would be. So I'm inclined to think 320x240 is down sampled, but why I don't get as many distinct values I don't know.

Friday, July 29, 2011 12:22 AM
• Hello Eddy,

i wrote a function which gets the world-coordinates of the palm, transforms them to the depth-space and uses this coordinates to segment the hands and to detect a more precise position of the handpalm. I use the depth-stream without player index in the resolution of 640x480. The first thing i need to do with the received image is to mirror it (because the skeleton data is always mirrored and the depth stream in this resolution isnt), then segment the hands and then calculate the position (in depth space) of the palm. Then I use the Transform function to get the world coordinates of my computed palm

The weird thing is, that even if my calculated position of the hand is in proximity of the palm provided by the SDK, the coordinates differ greatly:

http://i.imgur.com/lIt0K.png

in the image you see my computed palm (the blue circle) and the sdk-palm (the white thick circle). please ignore the finger detection. The two Points are in proximity to each other, still the world coordinates differ greatly (over 20cm in the x-axis).

here is the idea of the function:

void getPalm(&Vector4 &handPos) {

float fx, fy;

NuiTransformSkeletonToDepthImageF(handPos, &fx, &fy);

int origin_x = floor(fx*depthImg.cols + .5f);
int origin_y = floor(fy*depthImg.rows + .5f);

// segment the hand

....

// calculate the palm

....

Palm palm;

palm.m_depthCoord = "..." //what we computed;

palm.m_world = NuiTransformDepthImageToSkeletonF(static_cast<float>(palm.m_depthCoord.m_x / depthImg.cols), static_cast<float>(palm.m_depthCoord.m_y / depthImg.rows), (palm.m_depthCoord.m_depth << 3));

}

ps: depthImg is a cv::Mat

EDIT:

Nevermind, i got my error... I'm feeling so dumb :( ... static_cast<float>(palm.m_depthCoord.m_x / depthImg.cols) is wrong, it needs to be static_cast<float>(palm.m_depthCoord.m_x) / static_cast<float>(img.cols);