Asked by:
Transform between Skeleton coordinate and camera coordinate
Question

Hi,
I'm doing skeleton tracking using the tracker which works good.
My problem is that I need to represent the skeleton in RGB camera's coordinate system. (I do not mean the 2D image coordinate of the points in the color image, which is available using the function of MapSkeletonPointToColorPoint in CoordinateMapper.)
I would like to have the 3D positions of skeleton points (in Skeleton coordinate system) in the RGB camera's coordinate system, which should be simple when the rigidbody transformation between the Skeleton coordinate and the RGB camera coordinate is known. However, it seems not available in the SDK.
I thought the Skeleton coordinate system would coincide with the depth camera's coordinate system, but some evaluations shows that it would not. I have tried explicit calibration between the RGB camera and IR camera in order to get the relative transformation between the skeleton and RGB coordinate, but it does not work. (In fact, I calibrated multiple kinects, and the relative transformations are estimated similarly, but the colordepth registration as well as the skeletonRGB registration is quite different device by device.
Is there any way to convert the 3D points in Skeleton coordinate into the RGB camera's coordinate system?
All replies

Understand that there is no 3D color space for a color camera. A color camera has no understanding of a 3 dimensional world. You could map the color space to depth space, but there are areas where the IR and 2D to 3D projection will result in a invalid mappings.
Carmine Sirignano  MSFT
 Proposed as answer by Carmine Si  MSFTMicrosoft employee Wednesday, December 10, 2014 8:19 PM

Thank you for your reply.
I understand that the color camera solely can not perceive 3D information. However, the camera should still have its own coordinate system. As any computer vision textbook explains, conventionally the x and y axis of the camera coordinate system coincide with the x and y directions of the image plane, and its zaxis is chosen to be the front direction of the camera, and the origin of the coordinate system is on the optical center of the camera.
The function MapSkeletonPointToColorPoint should be intuitively doing the following, no matter what its real implementation is. First, a point in the Skeleton Coordinate system is transformed into the "camera coordinate system" using a "proper Euclidean transformation" and Second, the transformed point (already in the camera coordinate system) is projected to the image plane using its intrinsic parameters.
What I'm asking is how to get the Euclidean transformation between the skeleton coordinate and the camera coordinate systems.
If we have two cameras, the transformation can be simply estimated by stereo calibration, but according to my observation, the skeleton coordinate system and the IR camera's (i.e., depth sensor's) coordinate system are not the same.

This is not provided as we already do the mapping as part of the coordinate mapper functionality. Each sensor will be slightly different and as such, if you need more precise than what is provided, then you will have to do your own calibration method. This is discussed in other threads, you might find more information on this site:
http://nicolas.burrus.name/index.php/Research/KinectCalibration
Carmine Sirignano  MSFT

OK,
So you said, the Euclidean transformation between the Skeleton coordinate system and the Color camera coordinate system is not provided by the SDK.
As you may notice, my question is on the observation that the Skeleton coordinate system would not be the same with the depth sensor's coordinate system, though many answers about it on this forum say so. If it is correct, the simple stereo calibration solves my problem. Because my problem is not the depthcolor mapping for texture mapping, Burrus' calibration (which I have already tried) does not work, mainly because his method does not use the "Skeleton Coordinate System." He estimated the 3D points' locations by himself, which are represented in the "depth camera's coordinate system," not on the "skeleton coordinate system". In my case, the Kinect Skeleton Tracker reports joint locations only in the Skeleton coordinate system.
Maybe the depth sensor coordinate system is different to the IR camera coordinate system, while I assumed that those two are the same. Please confirm it, and if those two are different, let me know what(and where) is the depth sensor's coordinate system.
Thank you.
