About Oversampling and Aliasing in Digital Compression
Updated: Mar 5, 2020
In a digital signal this convolution of spectra is a circular convolution and accordingly if anything lands outside of the Nyquist frequency it aliases 😞. Because a spectral convolution implies the expansion of the spectrum, the possibility of aliasing is there.
This plot shows the product of two signals where the sum of the bandwidths is smaller than the total available bandwidth (frequencies further away than Nyquist are shaded in dark). In this case, it is easy to see that because of the source and control signal’s spectra oversampling would be unnecessary.
In contrast, this plot shows the time product of a source and control signal whose summed bandwidth will exceed Nyquist frequency. Accordingly the result of the product of the signals in a 1X sample rate and 2X sample rate is different. The 2X extends inside the shaded area and the blue trace takes the reflected energy at base rate.
Because of this problem, oversampling is often quoted as the “perfect! There shall be no more problem” solution. However, REALTIME oversampling is not a trivial procedure. The process is as follows: UpSample -> anti-imaging -> non-Linearity -> anti-Aliasing -> downSample
Quick note: UpSample is also known as stretching which is inserting n-1 zeros in between samples to increase the bandwidth n times. DownSample is also known as decimation which is discarding n-1 samples every n samples to reduce the bandwidth n times.
The following figure will show this process both in the time domain and spectrally applied to some nice shaped spectrum. The applied non-linearity is x^2! This is a key element! Squaring a signal is super easy to understand from many different perspectives:
It is multiplying by itself which is equivalent to selfconvolution… accordingly the bandwidth can only double at the most
Squaring a sinusoid is doubling its frequency plus adding a DC offset (This is just a trig property, look for trig power formulas) (I removed the DC offset after the non linearity as its not relevant for our purposes)
squaring adds second harmonic distortion. Because its only second harmonic we know that the highest frequency will produce twice its own frequency which is doubling the bandwidth at the most.
Alright. From top to bottom. First two plots show the original signal. Next is the upsampled signal, it has zeros in between samples and repeated spectra, also 8x sample rate so more samples in the same amount of time.
Next is the anti-imaged signal, the repeated spectra are removed with a 12th order elliptic filter. Note that this filtering part might not be perfect!
Next we apply the non linearity and as expected the bandwidth is doubled. Check the time domain to verify that the squared amplitude of the anti-imaged signal is indeed the post Non-Linearity signal.
Next we apply the anti-aliasing filter and we get a band limited spectrum (usually it is possible to use the same filter architectures as the anti-imaging one)
Finally we decimate. At this point, everything that is in the shaded are WILL alias as it doesn’t have anywhere else to go; also, there will ALWAYS be some energy over there after this process. In this case it is below 60dB but honestly I don’t have any experimental data to say that this is indeed a non perceivable amount. Also as an example, it is very forgiving in the non linearity and the used filters which could be further optimized.
Ok, lets make things interesting. This plot shows the same process while band limiting the original signal a bit. Will the final aliasing be reduced at then end? Absolutely! Now the spectral expansion is not as extreme and now there’s some guard band to better filter images and aliases.
What if we were able to band limit the signal to ensure that no the spectral expansion was inside the original Nyquist frequency? In this case that would mean band limiting the signal to half the sample rate (e.g. low passing at 20kHz while at fs=96)
Finally a close up view of the three spectrums right at the edge of Nyquist to show their differences
This is a snare drum sample which we are going to process with a compressor. Both with a peak detector and an rms detector (Just for reference the threshold was set at -30dB, ratio is 2, pkAttack is 1ms, pk Release time is 100ms, rms time is 50 ms… There's some reasoning behind some of these settings but not that relevant to the analysis). From top to bottom, we have source signal (the snare), both control signals time and spectra and compressed signals with each of the control signals. Everything on the shaded area becomes aliasing. Its not much but the question that I want to tackle is how band limiting and oversampling can help and a basic attempt to quantify the different situations. (Keep in mind that everything in the shaded area at the end will be aliasing when returning to the base sample rate)
This is the same snare sample now with an oversampling algorithm. Again From top to bottom: The original source signal, then the oversampled signal, then both control signals (pk detected and rms detected), and the last two are both compressed signals with the pk control signal and rms control signal
To quantify the aliasing I’ve decided (because I’ve never seen this formally done) to compute the out of baseband energy (later aliased energy) and the inland energy. The ratio between these will be the percentage of aliased energy in relation to non aliased energy (Similar to THD-F from IEEE?)… I evaluated three different options: BandLimit the source or not, band limit the control signal or not or smooth the control signal or not. A quick note about smoothing out the control signal: notice that when the signal drops below the threshold theres a sharp transition when approaching unitary gain during the release process. Often these discontinuities are sources of high frequency energy, so smoothing out that transition should reduce the bandwidth of the control signal; interestingly, this is comparable to increasing the knee of the compressor as it creates a smooth transition between the compressed state and the non compressed state!.
Alright here’s what I have. Because there are just too many plots and combinations all the figures can be found in this link as pngs. However these two tables should give plenty of information
Some final thoughts. Even though this is non exhaustive through source material, I suspect non oversampling compression doesn’t have huge amounts of aliasing, BUT its perceptibility is still an unknown aspect. However, non oversampling compression can also have very very small amounts of aliasing if some techniques are combined to achieve this. On the other hand it was interesting to see that there are many architectures for oversampling compression. Should the control signal be computed at base rate? Or once the signal has been oversampled? Its not a trivial thing! This example was 8x oversampling but I suspect that anything more than 2x when done properly shouldn’t give too much advantage. This is because unless there are good reasons to think the spectrum is going to grow that much, oversampling that much can increase the aliasing!. Imagine doing 100X, and getting some very low noise floor, but right at the moment when decimation takes place, that noise floor is going to be added 99 times into the base band… so… maybe not that good XD.
Finally! Take all of this with skepticism, I really like DSP and have a background in that, but I’ve never done it in a professional environment (although I wouldn’t mind doing it hahaha). I bet there are tons of tricks and details that are only gained with experience, but I’m just a guy who likes this type of things XD nothing more, so if theres any doubt or debate about any of this I’m happy to discuss :)