Depth Gradient for Implementing the SVGF Denoiser

When I supervised a student implementing different image denoisers into my line renderer LineVis, we faced the problem that SVGF, one of the most often used denoisers for real-time path traced applications, is dependent on ∇z(p), which the paper describes as the "gradient of clip-space depth with respect to screenspace coordinates"1 at the point p. It is used in formula (3) of the paper as <∇z(p), p - q>, where the brackets denote the scalar product. q expresses the pixel coordinates of another point.

We surveyed how different software implement this. The Falcor framework uses a variable fwidthZ, which is read from a buffer created by a previous rasterization pass, which in turn uses the formula max(abs(ddx(linearZ)), abs(ddy(linearZ))). ddx and ddy compute the derivatives in screen space (the GLSL equivalent are dFdx and dFdy). While this implementation is not really exact, the intention of the authors probably was to avoid storing two floats in their g-buffer for the depth gradient, and using the maximum of the gradient in x and y direction might have provided a reasonable approximation.

However, we faced two additional problems.

  • LineVis uses denoisers for denoising ray-traced ambient occlusion images, and the primary hit information is not rasterized, but ray traced. However, ddx and ddy (dFdx and dFdy) are only available when using rasterization.
  • I had heard that the fragment shader derivatives may potentially be implemented in such a way on a GPU that they compute derivatives across, e.g., inflections in a triangle mesh, which, according to an acquaintance, does indeed sometimes result in minor visual artifacts in applications like video games. GPUs usually organize fragment shader invocations into 2x2 blocks like seen below. If the orange and green triangle belong to the same mesh, we may get high gradients if the depth discontinuity between the triangles is large.

Another approach the student I supervised tested was computing the gradients from the depth g-buffer image, but while that is compatible with ray-traced first hit information, it suffers from even worse artifacts at borders than fragment shader derivatives, as it is guaranteed to be incorrect at all object borders.

In the end, I derived the following formula that can be used to compute ∇z(p) from the slope of the triangle in screen space (n is the normal in camera space).

$$ \nabla z = (n_x / \sqrt{1 - n_x^2}, n_y / \sqrt{1 - n_y^2})^T $$

The according GLSL code can be found in the GitHub repository of LineVis:

This formula is derived from the following equality:

cot(arccos(A)) = cos(arccos(A)) / sin(arccos(A)) = A / sin(arccos(A)) = A / sqrt(1 - A^2)

... where the x and y component of the normal vector can be thought of as the cosine of the projected angles in x and y direction between the triangle normal and the camera view direction (the dot product of two unit vectors yields the cosine of the angle between them, and the camera axes are just unit vectors in view space). The slope should be equivalent to the cotangent of the angle, but I did not find the complete derivation I created back when I wrote down this formula.

  1. Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path-Traced Global Illumination (2017). Christoph Schied, Anton Kaplanyan, Chris Wyman, Anjul Patney, Chakravarty R. Alla Chaitanya, John Burgess, Shiqiu Liu, Carsten Dachsbacher, Aaron Lefohn, Marco Salvi. Proceedings of High Performance Graphics 2017. ↩︎