Asked by:
How depth distances are measured?
Question

I'm struggling with depth data accuracy. I wanted to check what I get from sensor.
What I did I put sensor on a floor perfectly parallel to the wall in distance of 82cm. Then I check a depth for each pixel in row 240.
Then I ran a test for ~12000 frames and then I checked an average distance for each pixel. Here's what I got:
Because the device was put perfectly parallel to the wall I would expect the line to be perfectly horizontal too. But it's not a case.
Alternatively, I considered another system where depth is measured form the camera to a specific point. That would explain why distances to points that are closer to edge are longer than those in the middle. But this is not a case either. In the middle of screen, the real distance to wall was ~820mm while at pixel 1 (most left one on the picture and the most right in reality) the distance was ~920mm. That makes a ~100mm difference. However, the picture shows that a difference between the highest and the smallest value is around 40mm.
Can you please explain how the depth data works?
thanks,
Grzegorz
All replies

Hi,
You should have a look at that paper on Kinect v2 which came out a couple of days ago:
http://www.intarchphotogrammremotesensspatialinfsci.net/XL5W4/93/2015/isprsarchivesXL5W4932015.pdf
The depth given by the camera is not the distance from the camera center to a point, it is the z coordinate in the camera reference frame:
https://msdn.microsoft.com/enus/library/dn785530.aspx
Ben
 Proposed as answer by Carmine Si  MSFTMicrosoft employee Tuesday, March 3, 2015 6:29 PM

Thanks for quick answer and the article. I find it very interesting.
Based on what you said, the line in my test should be (almost perfectly) horizontal.
Am I right? If so, can it mean that the device is uncalibrated or broken down?
I repeated that test for various distances and each time the line reminds the one I posted above.Grzegorz

I don't know much about depth, I thought a couple of things maybe you have already checked:
 Are you sure you have no sources of IR interference (as sunlight)?
 Are you sure wall reflectivity is the same in the entire plane you are measuring?
 Edited by jmmroldan Wednesday, March 4, 2015 9:18 AM

The depth deviation you observe from left to right is actually quite large (4 cm) if I read your scale correctly.
Deviations in the paper are smaller (up to 1.6 cm + 1 cm, see figure 12).
If you want better results, you should try increasing the distance to the wall (errors at 80 cm are larger than at 1.25 m for instance, have a look at figure 11). I agree with Jmmroldan on reflectivity, also perfect sensor parallelism with the wall is tricky to get. You should try different things and see in which conditions you get better results (maybe try a different wall too).
I don't think your sensor has a particular problem, but a geometric calibration of the IR camera may help (section 4.1 of the paper).

Thanks for all your answers.
On wall gypsum boards are attached. It's really hard to get a smoother surface.
Sunlight issues also don't exist. The room has an equal lighting.
Actually the article is missing the information I'm searching for. The figure 12 shows how distance from wall influences a deviation. But they didn't mention if measured deviation (at a given distance) is the same for all pixels (from left to right; or from top to bottom  depending on how the sensor was put). That would actually have to be a 4D chart (row,column,distance from wall, deviation).Anyway, I'll check the geometric calibration. Hopefully that will help.

If you want to understand this better I would recommend taking the whole row and fitting it the best fit line (after converting to CameraSpacePoints and plotting how the points deviate from the line.
Alternatively you could point the camera such that the wall fills the entire frame and fit the entire frame to plane and plot this deviation. If you do this I think you will find a more clear story than your current plot is telling you because these techniques won't rely on properly interpreting a depth value or on the precise nature of your physical setup.
Good luck!


Let me reheat this topic.
I've continued to investigate this and now I have even more doubts than before.
I did even simpler experiment. I attached a cuboid to a wall (its walls or edges have no distortions).
Then for each point along x coordinate I measured delta which is a distance between depth to wall and depth to my box surface).
In theory all these deltas should be equal to box thickness.
Below picture presents what I did:Then I recorded ~1000 frames and printed a chart with average values to demonstrate what I got:
Because thickness of the box is constant I would expect a flat line. Alternatively I could imagine a tilted line (which reminds y=a*x function). That could be due to perspective effect (deltas could constantly increase or decrease along x coordinate).
But I also did more measurements with Kinect being put on different angles. Each time I got similar (neither equal nor proportional) results.And I can't find explanation for this chart.
I checked few Kinect devices. Each of them behaves in a similar way. I also tried to convert values to camera space, made sure light or surface have no impact, etc. All with no success.
Do you guys have any idea on what is wrong in my test? Is there something I didn't think of?
Could you also tell how did you make sure your devices are calibrated properly? Would be also useful if you could tell how you calibrate them (there are many methods to calibrate Kinect but all of them are rather scientific and hard to be done by a normal user at home; by this test I just wanted to make sure that devices I use work fine).

Hello,
What is the x axis in your graph?
If you use raw depth from the sensor how do you make sure your sensor is not tilted with respect to the wall? Otherwise you need to know the positioning of the sensor with respect to the wall (through extrinsics calculation with a checkerboard for instance: see calibration link below) and do something like projecting the distance of As to Bs on the normal to the wall.
But note that if the deltas you got are in mm, your results are not that bad: a variation of 18 mm between delta min and delta max is not that bad for a Kinect. It could certainly be better in optimal conditions  better use the central area of the field of view, make sur the range is optimal, avoid reflective surfaces to limit multipath interferences and yes, you may want to recalibrate at least the IR camera. If you are familiar with Matlab, this toolbox is easy to use:
http://www.vision.caltech.edu/bouguetj/calib_doc/
Ben

The x axis corresponds pixel.x value for in raw depth data. Actually it's shifted by ~30 pixels (I drew only its middle part). So pixel 1 on the chart corresponds ~pixel 30 in depth stream and the pixel 449 on the picture corresponds ~480 pixel in depth data.
Initially I worked with color image => CameraSpacePoint flow. But I noticed this issue and narrowed the problem. That is how I ended up with above description.
Answering your second question. I have no certainty that kinect is not tilted with respect to the wall. That is why I wrote before I could understand a tilted line on a chart ( / or \ ). That would be due to perspective effect. I could also a 'V' kind of line on the chart (perspective occurs to left and right side).
So trend is fine (on left hand side depth is 95 and on the right hand side depth is 115). If values were increasing constantly and proportionally I would have no questions.
Below you'll find a flow I started with. For exercise I described in my previous post I used color picture to find CameraSpacePoint.
And here is the result:
What I can't understand however, is the 'wave' in the middle part of the chart.
Is it a result of how kinect is built? Second question is, how do you guys build algorithms that require precise data (if uncertainty is so high)? Unless I do something wrong...
I'll appreciate any help or advice on that.

The wave part  you can find the same pattern in figure 12a of the paper I sent you a link of. No idea what causes this pattern.
Video games and other UI applications do not actually require much precision, ~ 1 cm is more than enough. If you are doing metrology there are more precise (and also much more expensive) sensors out there.
Ben

Thanks a lot for answer. I was hoping to hear that I did something wrong.
It would be interesting to know if you guys also observe such phenomenon. Looking at your apps (i.e. kinect studio) it feels like this phenomenon doesn't exist there.
And of course this issue can be fixed by creating a correctness table (the only problem is that it has to be generated for each sensor) so there is no need to buy more precise sensor. Kinect is really a powerful device. We just need to know what we work with.Grzegorz

Actually I'm not affiliated with Microsoft.
I'm sure they have extensively tested the Kinect2 but they won't even share precision specs. One of the problem with TOF imaging is that multipath interferences can dramatically decrease precision. So precision is strongly scenedependant.
How to overcome this? Measurements at different frequencies, but this is only research for now.
http://research.microsoft.com/pubs/245069/chr_mpi_cvpr_15.pdf
Ben