Filament is an open-source Physically Based Renderer (PBR) targeting mobile platforms, Android in particular. It features basic implementations of temporal anti-aliasing (TAA) and upscaling.
Below we discuss a few implementation details of temporal anti-aliasing and upscaling:
- The correct 2D application of the Lanczos filter.
- The shape, size and position of the anti-aliasing filter.
- The history feedback parameter.
Note
A Survey of Temporal Antialiasing Techniques is an excellent starting point if you're not familiar with temporal antialiasing and upscaling.
Temporal anti-aliasing implementations need to sample both the input and history buffers at arbitrary texture coordinates due to jittering and reprojection respectively. Bilinear sampling is inadequate because it results in an overly blurred image and exhibits anisotropic artifacts.
Filament’s TAA implementation uses a 2D CatMull-Rom filter for sampling the history buffer and a Blackman-Harris approximation for the input. These filters were chosen without putting too much thought into it. CatMull-Rom is a high quality and very efficient filter, while Blackman-Harris was suggested in the “High Quality Temporal Supersampling” Siggraph 2014 presentation by Brian Karis – so that was the end of it.
For temporal upscaling however, a filter that preserves more details is preferable. Looking around the internet a bit, the Lanczos filter seemed to be a popular choice – it’s used by both FSR and SGSR, so it seemed natural to use it for sampling the input buffer.
Lanczos is a sinc-windowed ideal reconstruction filter:
Lanczos can be used effectively for resampling a digital signal or as a low-pass filter.
The
Lanczos as defined above is a 1D filter, but obviously here we need a 2D application of it. That's where things start to become weird. There is something peculiar in the various usages of Lanczos as a 2D filter: sometimes it is used as a radial basis function (RBF), and at other times it is used as a separable filter:
For example, FSR2 uses the separable application for sampling the history buffer, but the RBF version for sampling the input buffer. Looking around the internet, the overwhelming explanation given is that Lanczos is not separable, but for performance reasons it is often approximated by the separable version. This is for example the explanation given here, but there are many other sources that make the same claim.
Oddly, the Lanczos Wikipedia page
unambiguously states that
The claim that
I finally got around implementing Lanczos-2 into Filament using the RBF and
supposed "correct" definition
Moreover, this sharpening happened even when centering the Lanczos kernel exactly at pixel centers. We would expect the filter to be a no-op in that case, as it is in 1D.
The 1D Lanczos filter is a no-op when centered on an input sample
When looking at it more closely, it is clear that it cannot be a no-op, since the corner samples are not located at the same distance as the “cross” samples, so they receive a negative weight:
The "cross" samples all get a filter coefficient of exactly zero, while the
"corner" samples get a negative coefficient. The middle sample's coefficient
being exactly 1.
Clearly, something isn't right. And when that happens, the best solution is to go back to first principles and stop trusting the Internet.
More precisely, what is an image generated by the GPU?
All the GPU does, really, is to sample triangles on a regular grid. These samples are ultimately stored into memory, which, in our case, is a texture. The value of each sample is determined by running a fragment shader.
The key point to realize here is that the GPU has the same function as a 2D ADC (Analog-Digital Converter), sampling analytic geometry (triangles) at specific locations on a regular grid.
The Nyquist-Shannon Theorem tells us that to perfectly reconstruct a 2D image, it must be sampled at at least twice the frequency as the highest horizontal and vertical frequencies present.
Note
Unless otherwise noted, all spatial-frequency graphs below have coordinate
units in radian/pixel — divide by
After rendering, we're left with a sampled 2D image, whose samples are stored in a texture, in a grid organized in rows and columns: a texture is not made of small little squares.
Therefore, sampling can be seen as multiplying the original (analog/analytic) image by a regular 2D dirac comb (also called bed-of-nails function). In the frequency domain this corresponds to the convolution of the image's spectrum by another bed-of-nails function. The convolution splats a copy of the image's spectrum at each dirac pulse of the comb, effectively replicating the spectrum on a regular grid in the frequency domain, ad-infinitum:
Sampling an Image (top-left) is equivalent to multiplying it by a 2D-comb
(middle-left) which results in the sampled image (bottom-left). In the spacial-frequency
domain, the image's spectrum (top-right) is convolved by the comb's spectrum
(middle-right) and results in a duplicated spectrum of the image (bottom-right)
Note
The images above are a simulation of sampling an analog image. In reality,
and as stated above, the spectrum of the image is replicated in all directions
forever (i.e: it's not limited to
The key idea to visualize here is that the spectrum is replicated on a regular grid.
If we assume that Nyquist-Shannon is satisfied, we can reconstruct the original 2D image without any loss just from its samples. This is done by removing the copies of the spectrum created by the sampling operation. Since the copies are placed on a regular grid, the ideal reconstruction filter has the shape of a square in the frequency domain:
This corresponds to a convolution by the separable, but anisotropic,
filter
This filter is not isotropic, which can be problematic, especially when dealing with rotations. The isotropic version of this ideal reconstruction filter has the shape of a disk in the frequency domain:
This corresponds to a convolution by the isotropic, RBF (non-separable) jinc filter in the spatial domain:
with
and
Important
The sinc filter (as opposed to jinc) yields to a wrong "reconstruction" filter that ends up significantly sharpening the image's high-frequencies:
Because the disk has a smaller area than the square, this ideal, isotropic, reconstruction filter will blur a little bit more than the square filter. The disk reconstruction filter is ideal if the original signal is properly band-limited (i.e. doesn't have spectral content outside of that disk).
Note
We just described the ideal reconstruction filter. It cannot be implemented in practice because it requires infinite support (i.e. infinite length). Instead, we use a windowed version of the ideal filter, such as Lanczos, or other low-pass filters.
It's not actually possible to reconstruct our original "square" image using an ideal reconstruction filter because that filter cannot be implemented (here we have a truncated version of it), and our original image wasn't band-limited in the first place:
Reconstruction of an improperly band-limited image, using a truncated
ideal sinc filter, yields to anisotropic ringing artifacts.
Just to recap, we’ve just shown that:
- Separable
$sinc(x, y)$ is the ideal 2D reconstruction filter. - Radial
$jinc(\rho)$ is the ideal isotropic reconstruction filter. - Radial
$sinc(\rho)$ is just completely wrong.
Given that we found the radial
Radial profiles of Lanczos-2 and -3 FFTs. High frequencies are heavily boosted.
And clearly it is incorrect. As can be seen above, the radial application of Lanczos, while isotropic, is a bad reconstruction filter which overly boosts high-frequency while letting through a lot of the frequencies past the ideal cut-off. This results in sharpening the image without preventing aliasing artifacts effectively.
Just like with
with
Radial profiles of
Unfortunately, this filter kernel is computationally intensive as it uses
the
The separable application of the Lanczos filter is actually correct — albeit not isotropic — and it is not an approximation of the radial application of 1D Lanczos unlike what can be often read on the Internet. In that regard, the Lanczos Wikipedia page is correct.
The correct radial and isotropic Lanczos application uses a modified Lanczos
equation which uses
The radial application of the 1D Lanczos filter,
We've seen above that to be able to fully reconstruct the original 2D image from its samples, the sampling operation needed to satisfy the Nyquist-Shannon theorem. Failing to do so causes the part of the spectrum above the Nyquist frequency to fold back onto the part of the spectrum below it, effectively destroying that part of the signal. This folding is due to the infinite replication of the spectrum at every multiple of the Nyquist frequency.
Below is an illustration of this effect in 1-D. This shows the spectrum over time of a frequency sweep between 0 and 6KHz, for a sampling rate of 10KHz. The Nyquist frequency is therefore 5KHz, and the part of the spectrum between 5KHz and 6KHz is folded back around 5KHz, destroying the signal between 4KHz and 5KHz.
By default, rasterizing a triangle on the GPU does not satisfy the Nyquist-Shannon constraint, and this often manifests with moiré patterns in areas of high frequencies, which become unrecoverable.
![]()
Aliasing can be seen in the distance. Low frequencies appear where there
should be none.
MSAA and mipmaping are two ways that GPUs can use to help mitigate this during rasterization. MSAA addresses aliasing due to sampling the geometry, while mipmapping addresses aliasing due to sampling textures.
Note
Shading computations can also create high frequencies, for example with specular highlights.
MSAA and mipmapping effectively approximate sampling a band-limited image.
Note
This explains why most spatial upscalers, such as FSR1 or SGSR1, work better with a "well anti-aliased" source image, a properly band-limited sampled image. Such an image can be better reconstructed according to Nyquist-Shannon. In essence, the sampled image has more high-frequency information preserved (i.e.: it contains more of the original image).
Mathematically, anti-aliasing corresponds to sampling the signal (here an image) at a higher rate and applying a low-pass filter to that (this is called super-sampling anti-aliasing, or SSAA).
When we sample the image at a higher rate, we effectively push higher the frequencies destroyed by the overlap of the replicated spectra. In other words, the frequencies that would have been destroyed at the lower sampling rate are now intact (or at least not affected as much). Of course, our image is now of higher resolution, so we need to re-sample it. This time however, we first apply a digital low-pass filter, satisfying Nyquist-Shannon.
![]()
256x Anti-aliasing using the separable Lanczos-2 low-pass filter. The moiré
pattern is reduced.
This is why "good" anti-aliasing is not "just the average" of the samples taken within a pixel. Averaging corresponds to a box filter, which is a lousy low-pass filter:
Examples of various filters frequency response profiles. We use the radial
version of filters for illustration.
We render the image at a higher resolution than the desired output resolution, for instance 4x. The higher sampling rate allows higher frequencies in the image to be preserved by the sampling process, instead of being folded back into the image due to the spectrum replication discussed earlier.
The output image is then reconstructed by applying a low-pass filter satisfying the Nyquist frequency at the output resolution. Output, anti-aliased, samples are reconstructed one at a time by applying the low-pass filter, for example Lanczos-2. This precisely corresponds to the figure below:
The anti-aliased sample is reconstructed by calculating the weighted-sum of each high-resolution sample by the kernel value at that sample location. This is called a convolution:
Importantly, the result needs to be normalized by the sum of the kernel weights:
Notice that the kernel is centered on the sample to be reconstructed, and the width of its first lobe is two low-resolution (anti-aliased) pixels. This exactly matches the corresponding reconstruction filter for the target resolution image.
Temporal anti-aliasing aims to spread the filter computation over multiple frames to reduce the computation demands.
Each frame
Note
We can do all the analysis in one dimension because we've seen earlier that we can use separable reconstruction filters, which consists of 1-D filters applied horizontally first and then vertically.
Each frame is rendered at the same resolution — or sampling rate — as the target image, but is offset in order to take a different sample. A partial low-pass filter is applied and the result is accumulated to the output image, as per the equation above.
Here we get a subset of the samples each frame, specifically one per output pixel, but we apply exactly the same filter as in the spacial case: it has the same center and size as before.
Note
This is what is often called "unjittering" in many TAA implementations (because the kernel is offset by the jitter-offset to keep it centered on the target pixel-center). "Unjittering" is a bit of a misnomer, as all we're doing is keeping the kernel centered on the target pixel.
As seen previously, each frame is calculating a partial result,
This operation converges to the average of
Because
And finally, when
which is the average of
Note
At the
So in the end, the partial filter results are essentially averaged, provided
that
The averaging over
It is however possible to fix this problem by multiplying the accumulation
factor
Note
As a nice side effect, this also removes the possible divide-by-0 when
The steady-state sequence
and because it's periodic of period
Now, let's compare
Because
We can now substitute this equality in equation
Rearrange and divide by
Now, take the limit as
because neither
And since
And finally:
As demonstrated here, multiplying
In practice when using filters that have negative lobes, like the Lanczos filter, we need to ensure that the resulting sample is not negative. This is called "deringing". A negative output from Lanczos is possible because of its negative weights. This typically happens in areas of high contrast:
Lanczos-2 produces a negative output sample.
The deringing operation is not linear and not commutative. In practice we often ignore this problem. A solution is to use a filter without negative lobes, but they generally produce a blurrier image.
Something to realize is that temporal upscaling like FSR-2 and others, is not actually upscaling, as in "resampling", or as in "video upscaling", it is in fact just good old temporal anti-aliasing, except that we produce less samples per frame, but ultimately, all samples are available. This is why the term "super resolution" is more appropriate.
In the example below, each frame produces half as many samples, so we need twice the number of frames to converge to the same result.
Critically, the width of the filter is unchanged compared to the TAA case; it is still sized like a reconstruction filter at the output resolution:
Note
Many upscaling TAA implementations use a filter sized for the input
resolution — which in my opinion is incorrect — to compensate for this, they
modulate the history accumulation factor (often called
Because the partial filter is now computed with less samples, while keeping
the same support, we can end-up with a partial result where the normalization
factor is much smaller than 1. This is a key difference compared to TAA.
It becomes essential to modulate the
Note
Each partial filter computation can be thought of computing a low-pass filter with a cutoff frequency above the Nyquist frequency. While this seems counter intuitive, the weighted exponential accumulation will converge to the proper low-pass filter at the target resolution.
Just like with the TAA case, deringing is a problem, and it is in fact more of a problem with upscaling, as intermediate frames are more prone to outputting a negative sample, because some of them lack a a sample with a strong positive weight.
There are two correct 2D applications of the Lanczos filter
And the radial basis function application, which uses a modified Lanczos filter:
with
In particular, defining
The history accumulation feedback parameter
This allows the exponential smoothing to converge towards the same result as
the spatial anti-aliasing filter at the limit
Temporal upscaling is just temporal anti-aliasing with less than one sample per frame, per output pixel. However, the filter stays exactly the same; its support, in particular, must remain relative to the output resolution.