2. Data Reduction for Audio
Objectives
- Identify the need for data reduction systems.
- Differentiate between lossless and lossy reductions.
- Define critical audio bands and their importance
in perceptual coding.
- Compare and contrast the various coding systems
currently in use.
Why Data Reduction?
As a data coding format, PCM is robust but inefficient.
It requires data rate of 1.4 Mbit/s for 2 channels of 16 bits @ 44.1k.
Many contemporary delivery systems (such as those used
on the Internet) require much lower data rates - as low as 128kbit/s per channel.
This is more than 10 times data reduction needed.
Data Reduction Process
Direct data reduction
- Can be done one of two ways
- reduce sampling or reduce quantization:
- Reduction in sampling rate will reduce frequency
response.
- Reduction in quantization will reduce precision
and introduce quantization noise.
- Usually, these compromises are unacceptable for high-quality
audio.
Instead of reducing the actual data, a system is needed
that can represent
the data in an coded form that uses less actual data.
Data Reduction Coding
- Lossless coding - allows for data to
be reduced and reconstructed into its original form.
- Method used by Zipit or Stuffit
- Method works well for most compression needs
- Does not yield a good enough reduction for
audio (about 2.5:1).
- Lossy coding - a system whereas data
is not precisely reconstructed when decoded, but a (hopefully)
accurate analogy is produced from the reduced data.
- Lossy techniques reduce
bit depth
- Raises quantization noise, but do so in such a way that
the noise is hidden.
- Result is a change in dynamics
and noise, but (optimally)
small enough
to
be indistinguishable
to the human ear.
Perceptual Coding
Perceptual coding is based on psychoacoustic principles
which define how some audio is masked by other audio.
Critical Bands
- Masking occurs with tones inside
of frequency bands
- A given tone will mask another tone within that
band, but will
not affect tones outside of that band
- Bandwidth of these bands approximated
to be about 1/3 octave for frequencies between
300-20,000 Hz
- Bands are not fixed, but are continuously variable
and and any audible tone
will create
a band centered around it.
- The masking tone raises the threshold of perceived
hearing around that tone
- Sound beneath that threshold is masked
- Sound
outside of the tone's critical band will not be affected.
Perceptual Coding Process
- Perceptual coding effectively reduces the bit
rate of a signal by implementing the psychoacoustic principles
based on critical bands and the masking phenomenon.
- The signal's sample rate is maintained, but the word
length is selectively decreased dynamically based on signal
conditions. Masking is considered so that the
increase in quantization
noise is rendered
as inaudible as possible.
- Works by taking advantage of the masking characteristic
of human hearing
- Sounds that occur at the same time as louder sounds can be
removed because we can't hear them
- Frequencies above 15 kHz are sometimes not encoded at all (less
common now)
- Lower number of bits used to encode high frequencies
- We can't hear the harmonics produced by a 20 kHz tone,
so "squaring
off' the waveform will be imperceptible
- The encoding process
- Incoming audio is split into many narrow frequency bands
- Audio is sent to an auditory model that decides what audio
is "important' and what isn't
- Each band can then be requantized using fewer bits
- Only levels above the threshold of perception are quantized.
The higher the level,
the more bits that are used
- Requantizing effects are constrained within the bands, and
are are more effectively masked by the band's program material
- Process repeats every few millisecond
- Although lossy, theoretically, the listener will
not perceive the loss.
- Encoders generally allow the user to select a target
bit rate, usually measured in kbps (kilobits per second)
- Results in audio files drastically reduced in size
with little perceived difference in quality
- Useful for end-users only, due to the fact that audio
quality decreases from generation to generation
Application
- Real-time streaming applications
- Low bit rates allow for high-quality audio to be "streamed" from
the web in real time
- Speech audio transmitted through digital cell phone networks
is encoded and decoded in real time
- Delay can be noticeable
- Encoding software/algorithm plays an important role
in the quality of audio encoded
- Incoming audio is split into many narrow frequency bands
- Audio is sent to an auditory model that decides what audio
is "important' and what isn't
- Process repeats every few milliseconds
EBU
subjective listening tests on low-bitrate codecs
Maximum
Streaming Rates
Target
Audience
|
Maximum
Streaming Rate
|
Bozo 14.4 Kbps modem
|
10 Kbps |
28.8 Kbps modem |
20 Kbps |
56 Kbps modem |
34 Kbps |
64 Kbps ISDN |
45 Kbps |
112 Kbps dual ISDN |
80 Kbps |
Corporate LAN |
150 Kbps |
256 Kbps DSL/cable
modem |
225 Kbps |
384 Kbps DSL/cable
modem |
350 Kbps
|
512 Kbps DSL/cable
modem |
450 Kbps |
786 Kbps DSL/cable
modem |
700 Kbps |