Answered by:
The exact meaning of WorldToCameraTransform in KinectFusion
Question

Hi,
I'm trying to do some augmented reality applications with KinectFusion, which requires to transform coordinates between the local 2D image and the global 3D world. KinectFusion does output a Matrix4 standing for "WorldToCameraTransform", but my several guesses of how to use it just don't work. The internal implementation of KinectFusion is encapsulated so I cannot tell from the example usage. I also searched online but didn't find related document. Does anyone here know about the details about how to use it? (I'm using the C# version of the SDK)
Here is my attempt. Given a 2D point (x, y), corresponding depth d (from DepthFloatFrame) and the WorldToCameraTransform M, I use the pinhole camera model and transform the 2D point to 3D as [(x  320) / 640 * d, (y  240) / 480 * d, d, 1] * inv(M). Since this is converting camera view to the world view, I put an inverse on the matrix.
I really appreciate any replies. And if any additional info is needed, I'm glad to provide it.
Answers

The Matrix represents the camera transform to its position in the reconstructed world. Keep in mind the values are transposed, ie. position is .M41, .M42, .M43, .M44
If you load the KinectFusionsBasicsWPF sample, this should show you the position of the camera with respect to the volume reconstructed world.
Debug.WriteLine("{0,6:N} {1,6:N} {2,6:N} {3,6:N}", this.worldToCameraTransform.M41, this.worldToCameraTransform.M42, this.worldToCameraTransform.M43, this.worldToCameraTransform.M44);
Carmine Sirignano  MSFT
 Proposed as answer by Carmine Si  MSFTMicrosoft employee Monday, September 9, 2013 10:23 PM
 Marked as answer by shability Tuesday, September 17, 2013 4:38 PM
All replies

The Matrix represents the camera transform to its position in the reconstructed world. Keep in mind the values are transposed, ie. position is .M41, .M42, .M43, .M44
If you load the KinectFusionsBasicsWPF sample, this should show you the position of the camera with respect to the volume reconstructed world.
Debug.WriteLine("{0,6:N} {1,6:N} {2,6:N} {3,6:N}", this.worldToCameraTransform.M41, this.worldToCameraTransform.M42, this.worldToCameraTransform.M43, this.worldToCameraTransform.M44);
Carmine Sirignano  MSFT
 Proposed as answer by Carmine Si  MSFTMicrosoft employee Monday, September 9, 2013 10:23 PM
 Marked as answer by shability Tuesday, September 17, 2013 4:38 PM

Thanks a lot for the reply Carmine! With your reply I finally figured out the details. Another thing worth to notice is, the RGB image and the depth image are not aligned (i.e. (u, v) in the depth image is not the corresponding depth for the pixel in the optical image). This also prevents me to understand the exact meaning of the matrix. Thanks again for your help!

Hi
I am working on kinect fusion and I am also facing same problem. I am not able to understand what mworldtocamera transform exactly represents. I am not able to understand if I have to make translation in xdirection, where shall i add that value? At M41 or M43?
Generally rotation matrix for rotation around x axis and 0 translation will be
1 0 0 0
0 cos_theta sin_theta 0
0 sin_theta cos_theta 0
0 0 0 1
How shall i represent this transformation in mworldtocamera transform
M11= 1, M21 = 0, M31 = 0, M41 = 0
M12 =0, M22 = COS_THETA, M32 = SIN_THETA , M42 = 0
M13 = 0, M23 = SIN_THETA, M33 = COS_THETA, M43 = 0
M14 = 0, M24= 0, M34 =0, M44 = 1
I just know that w.r.t kinect camera z axis is in the direction perpendicular to camera face and coming out of it. Can you please tell which are x and y axis.
I have really tried very hard to understand how these things work but not able to find any information properly on internet and hence bothering you with these small questions. It will be really very helpful if you can explain me these things.
Thanks a lot.

Hi bkanchan,
I wrote a detailed blog post about the technical details here after my problem got solved: https://grapeot.me/sometechnicaldetailsaboutkinectfusion.html. Hope it will be useful for you.