MapCameraPointToDepthSpace (but without a KinectSensor or CoordinateMapper) RRS feed

  • Question

  • So I'm trying to make it possible to play back recorded Kinect depth/color/body frames on a tablet or phone (i.e., without the SDK). Actually, I'm done with that part (look for the KinectEx project coming soon). The only thing that is unavailable is the mapping functionality since this is all tied to the sensor itself.

    I'm recording the results of an initial call to GetDepthCameraIntrinsics and GetDepthFrameToCameraSpaceTable in my recording file's header. I can use this however needed. Seems like it may be possible to reproduce MapDepthSpaceToCameraSpace, however I'm really not clear how to proceed with the rest.

    I'm far from an expert in optics (novice might be being generous), so the intrinsics info is Greek to me. Can I use that to do a projection of a CameraSpacePoint onto depth space (i.e., 3D to 2D projection)?

    And I'm completely clueless how I could accomplish depth <--> color mapping with what I have.

    Any help here would be appreciated. Is this possible, or am I just going to have to do without mapping?

    Paul T. York, Ph.D. Computer and Information Sciences Georgia Regents University

    Wednesday, August 6, 2014 3:28 AM

All replies

  • A singular initial call may not be enough (temperature influence on the lenses/sensors). Validating your approach w. our experts :)
    Wednesday, August 6, 2014 4:53 PM
  • I haven't found any good documentation about the API calls you mention, but I have a similar issue where I need to be able to perform the equivalent of GetDepthFrameToCameraSpaceTable without calling it directly.  In my case I want to do this lookup on the GPU in my OpenGL shader.  The idea I'm pursuing (not fully functional yet) is to simply create a lookup table in the form of a 3D texture which spans the depth sensor's range (horizontal, vertical and depth).  Say something like 128x128x128 texels.  This 3D texture would contain the results of calling MapDepthPointsToCameraSpace for each point in depth space sampled 128 times in each dimension.  I can then lookup any point in depth space by interpolating this 3D texture extremely fast and without calling any Kinect API methods.  In my case, its trivial to use this lookup table inside an OpenGL shader.

    My assumption is that the underlying algorithm being used has got to be very smooth, so linearly interpolating a precomputed table should be reasonably accurate and also very fast.  If I used 128x128x128 table and stored two floats (color space coordinates) then the storage is just 16M.

    For anyone in the know about how it actually is computed internally, is there any reason my assumptions about caching and interpolating the results are wrong and likely to cause problems?  Could methods like MapDepthPointsToCameraSpace return different results for the same input over short intervals of time (say more often than application start up)?  I'm planning on generating this table at application start and then using it for the lifetime of the app.

    Sunday, October 5, 2014 2:07 AM
  • Given the nature of the data, size/latency are major issues you need to consider. The more processing you can do on the local machine before sending that over would be ideal. Idealling you will send over the depth aligned color and depth frame data or vice versa. You will have to profile the devices but given wireless network speeds/latency and the capabilities of the device, sending over minimal data would be ideal.

    There are no offline api's or modes for the SDK. We must pull camera intrinsic data from the device itself and no 2 cameras are the same. The coordinate mapper provides the mapping table and a GetDepthCameraIntrinsics function.

    Carmine Sirignano - MSFT

    Monday, October 6, 2014 7:06 PM