Depth is generated based on the time of flight infrared(IR) data that is acquired from the sensor. Time of flight is different from structured light sensor than v1. In a particular frame, each pixel is going to have a valid depth value or unknown. The
only way to correct is to average the data acquire from the SDK. Each frame can be compared to the previous given a constant that the camera didn't move. Based on that each pixel can be averaged with previous frames(more than one) and that should level off
to a point.
Since this is all based on IR data, the reflective properties of the surfaces/environment will determine the amount of noise. With time of flight, the ir data is actively filtered to only look for the wavelengths the sensor has emitted into the area.
If you are highly reflective areas(lots of glass, mirrors, shiny metal, amount of direct sunlight) that is going to affect how the IR gets bounced back into the sensor. This also applies to materials that absorb IR.
The only things you can do is determine how the IR data looks using IR basics and if you are seeing a lot of pure black and bright white that is going to affect what depth can be detected in those areas.
Carmine Sirignano - MSFT