none
Kinect Fusion voxel resolution precision? RRS feed

  • Question

  • Could someone please tell me if the Kinect Fusion depth threshold minimum is increased and maximum is reduced, does that smaller range increases the overall precision of the scanned volume, or does it stay the same?

    Say if I set the Kinect Fusion's depth threshold minimum to 2m and the maximum to 3m, thus setting the scanned range to 3m-2m=1m, does then the volume voxels per meter setting of say 640 mean that I would get a voxel depth precision of 1m/640? Or is the resolution value applicable to the Kinect complete depth range instead of the voxels per meter? Also, how is width and height affected by depth threshold settings, and how to calculate the precision in those two remaining axis?

    TIA

    P.S. If the volume voxel resolution is set to maximum for all three axis (768x768x768) what is the minimum amount of GPU memory needed to make Kinect Fusion work?

    Sunday, November 29, 2015 1:08 AM

Answers

  • Depth filter just removes things outside the set range from being used in the fusion registration. Volume size is set by vox / meter and voxel resolution, i.e. 512 vox / m and voxel res of 512 x 512 x 256 gives a volume 1 m x 1 m x 0.5 m with each voxel having 1.95 x 1.95 x 1.95 mm. 

    On my system, using a modified WPF explorer sample,

    512 x 512 x 512 : 1285 MB

    640 x 640 x 640 : 2260 MB

    768 x 768 x 768 : 3720 MB

    My 4GB GTX 680m was chugging at 5 fps at 768. 

    KinectFusion does not work well for fine resolution and large scanning sizes, for pretty much these exact reasons. In the original paper (Newcombe et al) they say that although it is speed optimised it is not memory optimised. If you want to do large scale and fine resolution scanning then you should look at kinfu large scale or voxel hashing,  

    http://pointclouds.org/documentation/tutorials/using_kinfu_large_scale.php

    https://github.com/nachtmar/VoxelHashing/

    • Proposed as answer by Phil Noonan Thursday, December 3, 2015 8:48 AM
    • Marked as answer by vic456 Friday, December 4, 2015 1:33 AM
    Monday, November 30, 2015 9:22 AM

All replies

  • Depth filter just removes things outside the set range from being used in the fusion registration. Volume size is set by vox / meter and voxel resolution, i.e. 512 vox / m and voxel res of 512 x 512 x 256 gives a volume 1 m x 1 m x 0.5 m with each voxel having 1.95 x 1.95 x 1.95 mm. 

    On my system, using a modified WPF explorer sample,

    512 x 512 x 512 : 1285 MB

    640 x 640 x 640 : 2260 MB

    768 x 768 x 768 : 3720 MB

    My 4GB GTX 680m was chugging at 5 fps at 768. 

    KinectFusion does not work well for fine resolution and large scanning sizes, for pretty much these exact reasons. In the original paper (Newcombe et al) they say that although it is speed optimised it is not memory optimised. If you want to do large scale and fine resolution scanning then you should look at kinfu large scale or voxel hashing,  

    http://pointclouds.org/documentation/tutorials/using_kinfu_large_scale.php

    https://github.com/nachtmar/VoxelHashing/

    • Proposed as answer by Phil Noonan Thursday, December 3, 2015 8:48 AM
    • Marked as answer by vic456 Friday, December 4, 2015 1:33 AM
    Monday, November 30, 2015 9:22 AM
  • Thank you for your answer, I thought so, but something just confused me...

    Sometimes it seems that when I try to reduce the depth Kinect Fusion would produce a better result (like smoother and with more points/surfaces reconstructed like the floor - which is missing in higher resolution meshes - which I can't get back in higher res meshes), unlike when the depth is left unchanged (larger), I'm guessing this has to do something with the amount of points being tracked and the fact that when the reduced depth which is lower than in your example Voxels Per Meter (vpm) / Resolution per z-axis (res) = 256/512 = 0.5m (which I gather is the maximum reconstructed depth - no matter what is set as depth thresholds - although I'm still unsure where that volume depth starts or ends when thresholds are larger than the vpm/res size) is used, it then tracks a lesser number of points which could eventually spoil their tracking (and the noise from the floor might be removing it from higher res scans, just leaving the object)...

    Also, somewhat stranger, less vpm sometimes, seems to produce smoother, even seemingly more completer models...is Kinect Fusion smoothing points somehow better in lesser resolutions...or is that some hardware limitation please? How much noise or maximum precision Kinect produces (guess it's not just as plain as calculating 1m/res - I think I remembered seeing some v1 noise/precision being near 1cm range - is v2 3xBetter)? Are there any instructions out there on what setting for maximum precision of Kinect v2 should be set to achieve most in scanning a two meter tall, half meter wide and deep structure or should for such results that be done in resolution of 768 voxels in y-axis scanned by 384 vox / m, and correspondingly setting 256 voxels resolutions for x- and z-axis (nearest available to half of 384 I guess, if that's even available, I think 768 in C++ Fusion Explorer is...so I calculated from there) or is there some other even better way to bound scans somehow please? tnx



    • Edited by vic456 Friday, December 4, 2015 2:05 AM
    Friday, December 4, 2015 2:02 AM
  • Think of the fusion volume as a cuboid in front of the Kinect, and any structure that is physically inside this cuboid can be used in the fusion algorithm. You can set the size of each voxel element by changing the voxel / metre parameter, but the size of your cuboid will change unless you also change the number of voxels in the volume.

    In the fusion sample, try changing the integration weight to 1. It's pretty much just fitting a surface to the raw depth frame. The quality of the fusion model is terrible, regardless of the voxel size or dimensions, because the Kinect raw depth is very noisy. If you use a higher integration weight, you will see the surfaces smoothing out as the fusion algorithm integrates the depth frames into the fusion volume (and each voxel gets a better average estimate of the true surface from the multiple noisy depth frames), and like you mentioned, if your voxel is bigger (low vox / m) , it will produce a smoother surface in that voxel. If you have very small voxel size (large vox / m) you will have a very fine mesh for fusion to integrate the depth into. At the start this will just give you fine resolution noisy surfaces, but over time if your model moves rigidly, then the new depth frames being integrated should smooth out the surface.

    Another thing to note from your post, you will get a varying degree of quality of scan depending on a few things. If you have a big volume, you will have a high chance of non-rigidly moving objects in the scene. If you have a small volume then you may not have sufficient 3D structure for fusion to reliably register depth data to.

    There is an upper limit on the number of voxels, dependent on your GPU, since each voxel is allocated its own piece of GPU memory. There is technically no upper limit on the vox / m parameter (other than just sensibility), I've gone up to 2048 on a near mode modified Kinect v2 before, but you sacrifice the absolute size of the fusion cuboid volume. And with the normal Kinect this would mean that you would see nothing in the volume, since all the cube would exist at a depth < 50 cm where you get no depth data from the Kinect.

    In the c++ fusion explorer sample, there is a flag set in KinectFusionParams.h called m_bTranslateResetPoseByMinDepthThreshold if this is set to true, then when you reset the volume, the cuboid is translated by the minimum depth on the slider. Essentially you shift the volume in z, so you can get a high vox / m volume, but still get depth points from the sensor. The further away from the Kinect you will get a reduction in the quality of the raw depth data, just due to the kinect optics meaning that each depth pixel now covers a larger area. But if you just integrate for longer you will be able to recover some resolution of the scan.

    In most cases, the params have to be tuned for the specific task at hand. For example, my near mode Kinect can easily resolve a tear duct with only 256 x 256 x 256 volume & 1536 vox / m, yet if I scan someone's torso I would need to use something like 512 x 512 x 512 & 256 vox / m. Both sets of params would give 'good' scans for their tasks, but fail for the other.

    Friday, December 4, 2015 8:15 AM
  • I know it's been long since you've answered, but could tell me how did you modify your WPF Explorer?

    Edit: nevermind, I've found it

    Thursday, September 1, 2016 12:18 PM
  • How did you modify it?
    Monday, December 9, 2019 11:03 AM