# The exact meaning of WorldToCameraTransform in KinectFusion

• ### Question

• Hi,

I'm trying to do some augmented reality applications with KinectFusion, which requires to transform coordinates between the local 2D image and the global 3D world. KinectFusion does output a Matrix4 standing for "WorldToCameraTransform", but my several guesses of how to use it just don't work. The internal implementation of KinectFusion is encapsulated so I cannot tell from the example usage. I also searched online but didn't find related document. Does anyone here know about the details about how to use it? (I'm using the C# version of the SDK)

Here is my attempt. Given a 2D point (x, y), corresponding depth d (from DepthFloatFrame) and the WorldToCameraTransform M, I use the pinhole camera model and transform the 2D point to 3D as [(x - 320) / 640 * d, (y - 240) / 480 * d, d, 1] * inv(M). Since this is converting camera view to the world view, I put an inverse on the matrix.

I really appreciate any replies. And if any additional info is needed, I'm glad to provide it.

Monday, September 9, 2013 12:57 PM

• The Matrix represents the camera transform to its position in the reconstructed world. Keep in mind the values are transposed, ie. position is .M41, .M42, .M43, .M44

If you load the KinectFusionsBasics-WPF sample, this should show you the position of the camera with respect to the volume reconstructed world.

`Debug.WriteLine("{0,6:N} {1,6:N} {2,6:N} {3,6:N}", this.worldToCameraTransform.M41, this.worldToCameraTransform.M42, this.worldToCameraTransform.M43, this.worldToCameraTransform.M44);`

Carmine Sirignano - MSFT

Monday, September 9, 2013 10:22 PM

### All replies

• The Matrix represents the camera transform to its position in the reconstructed world. Keep in mind the values are transposed, ie. position is .M41, .M42, .M43, .M44

If you load the KinectFusionsBasics-WPF sample, this should show you the position of the camera with respect to the volume reconstructed world.

`Debug.WriteLine("{0,6:N} {1,6:N} {2,6:N} {3,6:N}", this.worldToCameraTransform.M41, this.worldToCameraTransform.M42, this.worldToCameraTransform.M43, this.worldToCameraTransform.M44);`

Carmine Sirignano - MSFT

Monday, September 9, 2013 10:22 PM
• Thanks a lot for the reply Carmine! With your reply I finally figured out the details. Another thing worth to notice is, the RGB image and the depth image are not aligned (i.e. (u, v) in the depth image is not the corresponding depth for the pixel in the optical image). This also prevents me to understand the exact meaning of the matrix. Thanks again for your help!
Tuesday, September 17, 2013 4:38 PM
• Hi

I am working on kinect fusion and I am also facing same problem. I am not able to understand what mworldtocamera transform exactly represents. I am not able to understand if I have to make translation in x-direction, where shall i add that value? At M41 or M43?

Generally rotation matrix for rotation around x axis and 0 translation will be

1          0       0     0

0  cos_theta    -sin_theta 0

0  sin_theta cos_theta 0

0     0 0 1

How shall i represent this transformation in mworldtocamera transform

M11= 1,  M21 = 0, M31 = 0, M41 = 0

M12 =0, M22 = COS_THETA, M32 = -SIN_THETA , M42 = 0

M13 = 0, M23 = -SIN_THETA, M33 = COS_THETA, M43 = 0

M14 = 0, M24= 0, M34 =0, M44 = 1

I just know that w.r.t kinect camera z axis is in the direction perpendicular to camera face and coming out of it. Can you please tell which are x and y axis.

I have really tried very hard to understand how these things work but not able to find any information properly on internet and hence bothering you with these small questions.  It will be really very helpful if you can explain me these things.

Thanks a lot.

Monday, December 9, 2013 11:39 PM
• Hi bkanchan,

I wrote a detailed blog post about the technical details here after my problem got solved: https://grapeot.me/some-technical-details-about-kinectfusion.html. Hope it will be useful for you.

Tuesday, December 10, 2013 1:19 AM