AET 2050 - DAW Production


Sampling and Quantization: Quality Issues

The question often arises as to what sample rate and quantization resolution are necessary for a certain quality of audio.

 

Psychoacoustic Limitations

Digital audio can approach the limits of human hearing in terms of sound quality; however, poorly engineered product can sound very poor.

behringer

Not Pictured: Quality Audio

Choices of sampling parameters, noise shaping, and quantization affect frequency response, distortion, and perceived dynamics

The capabilities of the human ear should be regarded the standard against which the quality of digital systems is measured. Models based upon human sensitivity are used.

equalloudness

Work done by Louis Fielder and Elizabeth Cohen suggest that a dynamic range of of 115 dB is required for natural reproduction in a good consumer environment. This considers a noise floor of about 4 dB SPL and a maximum level of 120 to 129 dB SPL for common musical performances in favored listening conditions, and accounts for limitations of equipment used.

Sampling Rate

The choice of sampling rate determines the maximum audio bandwidth available. The conventional wisdom is that the the sampling rate should be no higher than necessary to reproduce the desired frequencies.

If conventional wisdom states that audible frequency extends to 20 kHz, then sampling should be just over 40 kHz for quality audio. Two standards exist that meet this expectation:

48 kHz leaves enough room for downward varispeed settings. (Varispeed changes speed of the recording medium, which in turn changes the sampling frequency.)

44.1 kHz allows full use of the 20 kHz bandwidth, and oversampling converters allow for quality filtering. There is also 10% less data than for 48 kHz samples.

Contemporary systems have doubled the traditional baseband sample rates.  It is very common now to work at 88.2 kHz, 96 kHz, or even 192 kHz.

Other common rates (see Rumsey, p.35):

masterclock

Quantizing resolution

The number of bits per sample dictates the dynamic range as well as the signal-to-error (or signal-to-noise) ratio of a digital audio system.

The traditional standard for linear PCM systems has been 16 bits. It affords a dynamic range of over 96 dB. Note that this is adequate for most applications, but fails to meet Fielder's spec of 122 dB for subjectively noise-free reproduction. To achieve this level would require 21 bits or more, which is attainable with current technology (DVD, quality converters).

See Rumsey p.37 for common bit depths found for certain applications. Note that the original format for Sony PCM recorders was 14 bits, and that many early samplers used only 12 bits!

Any professional audio workstation of any value now uses 24 bits as a practical workling bit depth.  Whereas debate exists as to what is the optimum sample rate, there is little agument that using a larger bit depth will result in a better-quality product.

Requantization

It is a good idea to work with digital audio in the deepest bit depth available; however, eventually the product must fit into a delivery standard - today, that standard is 16 bits. This means that bit reduction, or the lowering of the number of bits per sample, must take place.

Bit reduction is a very important process. The lowest bits must not be simply truncated, or quantization error is introduced that is similar to A/D conversion with no dither.

To properly reduce the bit depth, the samples need to first be dithered digitally - that is, a dither signal is calculated and the samples are altered accordingly. The level of the dither must be sufficient to match the new bit level.

Also, the original signal's level should be raised so that its peaks are close to the maximum allowable level. This will maintain as much dynamic range as possible. see Rumsey, p.41

Noiseshaping can be implemented during requantization. The noise shaping relocates the noise energy away from the most critical areas (around 4 kHz) and moves that energy up into less audible higher frequencies.

Reducing the midband noise can increase the dynamic range of the signal, almost to the equivalence of a 20-bit signal.

Different curves of noise shaping are available, and the choice as to which is used is subjective. See Rumsey, p.42

Error Correction

Errors are a fact of life in digital systems. To eliminate them is impractical. Is is given that errors exist, and a system must be in place to deal with them.

Three steps for error handling:

Virtually all digital systems implement detection.

If the redundancy is robust enough, bad data can be corrected so that the original data is maintained.

With systems such as audio, if the error cannot be corrected, it can be concealed using interpolation. If the data cannot be interpolated, the most extreme concealment is muting.

 

With typical computer systems, the validy of data is imperative; if there is bad data, it cannot be replaced with interpolated data; however, because of the nature of audio (or video), interpolation can be tolerated.

Timing Jitter

Jitter is short-term variation in the positions of audio samples in time. Ideally, all samples occur with exactly the same interval between. Variations in this interval can cause audible distortion.

Jitter has similar properties to quantization error, but along the x-axis.

 

a distorted signal demonstrates how jitter can manifest

 

Jitter has several potential sources including:

jitterjitter2

Jitter is quantified in two ways:

The effect of jitter is difficult to quantify, but its presence gives the impression of loss of instrument definition, narrowing of stereo image, collapse of soundstage depth, etc.

stereoimage to stereoimage

Jitter is more apparent in higher frequencies than lower frequencies. If it is periodic rather than random, it has an effect similar to 'flutter.' For a given test tone, it will produce sidebands equal to the jitter frequency. While the level of this distortion is low, it can still be a significant artifact. Research shows that an amplitude of only 5 ns can have a significant effect in a 16-bit signal.

Sampling-Rate Conversion

Sampling-rate conversion is a a very important and often-used process in digital audio, such as when transferring from a 48k master to a 44.1k CD. Another example would be when two units are running on separate clocks or when one unit's clock exhibits excessive jitter.

Ideally, conversion would be done by converting to analog and then back to digital at the new rate. (In fact, this is often a most viable choice, depending on the equipment being used.) However, there is always the problems of multiple conversions and filters altering the signal.

A digital solution is preferred. The process involves analyzing the original samples and interpolating new samples to fit the new sampling rate.

There are three basic categories of rate conversion:

    1. Integer ratio. This is the most straightforward conversion. All samples align to the higher sampling rate.
    2. Small fraction ratio. A fixed fractional ratio can be predicted. Some of the samples will periodically time-align.
    3. Variable ratio. There will be no simple relationship between input and output sampling rates. No samples can be predicted to align. This is the most difficult (and most common) process.

Theoretically, the system will interpolate a new set of samples, then discard the extras, leaving only the ones needed for the new rate. In reality, this is an excess of unneeded computations. The practical approach combines interpolation and decimation into one process.

For integer ratio conversion, the process is fairly simple. The samples always line up with the higher rate. This is the process used by oversampling converters when dropping down to the original sample rate.

In small fraction conversion, a common clocking rate can be used. The signal is oversampled at the lowest common rate, then decimated to the lower rate.

For variable ratio, much digital signal processing is required. The samples may need to occur at any point in time relative to the input rate; however, the system will only have a finite set of points in time to assign a new sample. The number of points will be dependent on the strength of the DSP.

The assignment of the new samples is akin to the assignment of quantization levels; there is a certain amount of rounding error involved. The manifestation of this error is jitter. The worse the error, the higher the jitter.

Obviously, the better the system's DSP, the better the conversion will be. Comparison of different equipment's conversion performance is a good idea.


Home