AET 2050 - DAW Production
Sampling and Quantization: Quality Issues
The question often arises as to what sample rate and quantization resolution are necessary for a certain quality of audio.
What are the effects of selecting certain values?
What are the standards employed?
What steps can be taken to improve the quality?
Psychoacoustic Limitations
Digital audio can approach the limits of human hearing in terms of sound quality; however, poorly engineered product can sound very poor.
Not Pictured: Quality Audio
Choices of sampling parameters, noise shaping, and quantization affect frequency response, distortion, and perceived dynamics
The capabilities of the human ear should be regarded the standard against which the quality of digital systems is measured. Models based upon human sensitivity are used.
Work done by Louis Fielder and Elizabeth Cohen suggest that a dynamic range of of 115 dB is required for natural reproduction in a good consumer environment. This considers a noise floor of about 4 dB SPL and a maximum level of 120 to 129 dB SPL for common musical performances in favored listening conditions, and accounts for limitations of equipment used.
Sampling Rate
The choice of sampling rate determines the maximum audio bandwidth available. The conventional wisdom is that the the sampling rate should be no higher than necessary to reproduce the desired frequencies.
If conventional wisdom states that audible frequency extends to 20 kHz, then sampling should be just over 40 kHz for quality audio. Two standards exist that meet this expectation:
Compact Disc rate of 44.1 kHz
'Professional' rate of 48 kHz
48 kHz leaves enough room for downward varispeed settings. (Varispeed changes speed of the recording medium, which in turn changes the sampling frequency.)
44.1 kHz allows full use of the 20 kHz bandwidth, and oversampling converters allow for quality filtering. There is also 10% less data than for 48 kHz samples.
Contemporary systems have doubled the traditional baseband sample rates. It is very common now to work at 88.2 kHz, 96 kHz, or even 192 kHz.
Other common rates (see Rumsey, p.35):
44.056 kHz and 47.952 kHz - used when standard-sampled digital audio is synchronized with NTSC video equipment.
32 kHz - used in some broadcast applications, such as television and FM radio. Also used for extended play DATs.
22.05 kHz - half the CD sampling rate, used in older computers.
11.025 kHz - one quarter the CD rate. Used in older computers for low-quality sound playback.
8 kHz - used in the telephone system.
The number of bits per sample dictates the dynamic range as well as the signal-to-error (or signal-to-noise) ratio of a digital audio system.
The traditional standard for linear PCM systems has been 16 bits. It affords a dynamic range of over 96 dB. Note that this is adequate for most applications, but fails to meet Fielder's spec of 122 dB for subjectively noise-free reproduction. To achieve this level would require 21 bits or more, which is attainable with current technology (DVD, quality converters).
See Rumsey p.37 for common bit depths found for certain applications. Note that the original format for Sony PCM recorders was 14 bits, and that many early samplers used only 12 bits!
Any professional audio workstation of any value now uses 24 bits as a practical workling bit depth. Whereas debate exists as to what is the optimum sample rate, there is little agument that using a larger bit depth will result in a better-quality product.
It is a good idea to work with digital audio in the deepest bit depth available; however, eventually the product must fit into a delivery standard - today, that standard is 16 bits. This means that bit reduction, or the lowering of the number of bits per sample, must take place.
Bit reduction is a very important process. The lowest bits must not be simply truncated, or quantization error is introduced that is similar to A/D conversion with no dither.
To properly reduce the bit depth, the samples need to first be dithered digitally - that is, a dither signal is calculated and the samples are altered accordingly. The level of the dither must be sufficient to match the new bit level.
Also, the original signal's level should be raised so that its peaks are close to the maximum allowable level. This will maintain as much dynamic range as possible. see Rumsey, p.41
Noiseshaping can be implemented during requantization. The noise shaping relocates the noise energy away from the most critical areas (around 4 kHz) and moves that energy up into less audible higher frequencies.
Reducing the midband noise can increase the dynamic range of the signal, almost to the equivalence of a 20-bit signal.
Different curves of noise shaping are available, and the choice as to which is used is subjective. See Rumsey, p.42
Error Correction
Errors are a fact of life in digital systems. To eliminate them is impractical. Is is given that errors exist, and a system must be in place to deal with them.
Virtually all digital systems implement detection.
If the redundancy is robust enough, bad data can be corrected so that the original data is maintained.
With systems such as audio, if the error cannot be corrected, it can be concealed using interpolation. If the data cannot be interpolated, the most extreme concealment is muting.
With typical computer systems, the validy of data is imperative; if there is bad data, it cannot be replaced with interpolated data; however, because of the nature of audio (or video), interpolation can be tolerated.
Jitter is short-term variation in the positions of audio samples in time. Ideally, all samples occur with exactly the same interval between. Variations in this interval can cause audible distortion.
Jitter has similar properties to quantization error, but along the x-axis.
a distorted signal demonstrates how jitter can manifest
Jitter has several potential sources including:
Poor quality sample clocks
Electrical noise or interference
Inferior interconnections
Jitter is quantified in two ways:
amplitude (how far has the clock drifted) - measured in time
rate, or frequency of the variations
The effect of jitter is difficult to quantify, but its presence gives the impression of loss of instrument definition, narrowing of stereo image, collapse of soundstage depth, etc.
to
Jitter is more apparent in higher frequencies than lower frequencies. If it is periodic rather than random, it has an effect similar to 'flutter.' For a given test tone, it will produce sidebands equal to the jitter frequency. While the level of this distortion is low, it can still be a significant artifact. Research shows that an amplitude of only 5 ns can have a significant effect in a 16-bit signal.
Sampling-Rate Conversion
Sampling-rate conversion is a a very important and often-used process in digital audio, such as when transferring from a 48k master to a 44.1k CD. Another example would be when two units are running on separate clocks or when one unit's clock exhibits excessive jitter.
Ideally, conversion would be done by converting to analog and then back to digital at the new rate. (In fact, this is often a most viable choice, depending on the equipment being used.) However, there is always the problems of multiple conversions and filters altering the signal.
A digital solution is preferred. The process involves analyzing the original samples and interpolating new samples to fit the new sampling rate.
There are three basic categories of rate conversion:
Theoretically, the system will interpolate a new set of samples, then discard the extras, leaving only the ones needed for the new rate. In reality, this is an excess of unneeded computations. The practical approach combines interpolation and decimation into one process.
For integer ratio conversion, the process is fairly simple. The samples always line up with the higher rate. This is the process used by oversampling converters when dropping down to the original sample rate.
In small fraction conversion, a common clocking rate can be used. The signal is oversampled at the lowest common rate, then decimated to the lower rate.
For variable ratio, much digital signal processing is required. The samples may need to occur at any point in time relative to the input rate; however, the system will only have a finite set of points in time to assign a new sample. The number of points will be dependent on the strength of the DSP.
The assignment of the new samples is akin to the assignment of quantization levels; there is a certain amount of rounding error involved. The manifestation of this error is jitter. The worse the error, the higher the jitter.
Obviously, the better the system's DSP, the better the conversion will be. Comparison of different equipment's conversion performance is a good idea.