Audio Formats

Anyone who watches the behaviour of big companies that want to sell us ‘home entertainment’ would probably agree with the maxim that, “We never learn anything from history!” Once again we are going through various ‘Format Wars’...

For about 20 years the main carrier for home audio has been the Compact Disc. Now things may be changing rapidly. The fastest-growing new home entertainment format in history is the ‘DVD’. More specifically the DVD Video . This has given us hours of video and multichannel sound on a disc that looks pretty much like the CD. Unfortunately, this isn’t the only new contender. Although less well known at present, there are at least two other types of consumer disc format, and to make things confusing they all look pretty much like a CD.

In this series of articles I want to concentrate on the audio performance of the different formats. I’ll give a brief summary of how each type of disc works, and what their good or bad points may be. The formats I will cover are:

CD-A (Compact Disc – Audio)
DVD-V (Digital Versatile Disc – Video)
DVD-A (Digital Versatile Disc – Audio)
SACD (Super Audio Compact Disc)

Both CD and DVD can also be used to store data in computer formats, but I will ignore that here. I’ll start off with examining the Audio CD in detail and use this as a benchmark against which we can compare the newer formats. I’ll also ignore details like the way data is actually encoded onto the discs to avoid reading errors, etc, and just focus on the potential audio performance when the disc and player are working correctly.

CD-A

The CD-A standard is to provide stereo audio in an LPCM (Linear Pulse Code Modulation) format. This means the information about the audio waveforms for the Left- and Right-Hand speakers is stored as a series of binary numbers whose values scale linearly with the sound pressure the loudspeakers are supposed to produce. In effect, this is just the same as writing down a regular series of pressure readings. The format chosen is to use 16-bit values (words or samples) and to sample each channel 44,100 times every second at regular intervals. Thus the audio bitrate from a CD-A is 2 x 16 x 44,100 = 1,411,200 bits/second.

From the mathematics of Information Theory we can show that using 44,100 samples/second we can only record a bandwidth (range of frequencies) that is less than half this value. As a result, CD-A sounds can only cover frequencies up to just a tad under 22.05 kHz. When the CD system was developed this was regarded as fine as it was assumed that – in general – most people can’t hear anything above about 20kHz, and that most music didn’t contain much at these ‘ultrasonic’ frequencies. These assumptions are contentious, though, and we may come back to this point later on.

Music and speech don’t just contain varying frequencies. The sound level (loudness) also varies from whispering quietness to being deafeningly loud. We can define a quantity called the Dynamic Range to indicate this as the ratio of the loudest/quietest sound power levels a system can handle. This value is usually given in dB (deci-Bels or tenths of a Bel) which is a logarithmic unit. 10 dB means a power ratio of 10:1. 20 dB means 100:1, 30 dB means 1000:1, and so on. i.e. every 10dB means another factor of ten in power terms.

Using Information Theory again, the 16-bit values on a CD-A mean we can specify 2¹⁶= 65,536 different levels. i.e. the ratio between the largest and the smallest sound pressure changes we can indicate from one sample to another are in the ratio 65,536:1. This is an amplitude ratio so we have to square it to get a power ratio. Hence, simply on a sample-by-sample basis, CD-A has a Dynamic Range of 4,294,967,296:1 which comes out as 96.3 dB. i.e. The data recorded on a CD-A should allow us to record and reply sounds over a range of power levels of about 4 billion to 1. For most domestic audio purposes this seems to be fine. Indeed, most people will never notice any background noise from a CD-A that is actually due to the CD-A format – although you may hear noises unintentionally recorded or added at the studio!

The newer formats all seek to improve on CD-A in various ways by doing things like having more bits per sample (more precision and hence a higher dynamic range), and/or more samples per second (wider bandwidth so higher frequencies can be recorded), and/or more than two channels to give ‘surround sound’ rather than stereo. Although DVD-V is currently much more widely used than either DVD-A or SACD I’ll go on to look at DVD-A next as it is the easiest format to explain in terms of its advantages over CD-A.

DVD-A

DVD-A offers a set of ‘improvements’ over CD-A. It uses 24-bit LPCM samples, so its sample-by-sample dynamic range is 144.5 dB. This means that, theoretically, it can provide a range that is about 50 dB (100,000 times) greater than CD-A. Now – for human hearing – the loudest sounds we can tolerate without pain and swift hearing damage are about 120 dB greater than the quietest sounds we can just detect. Hence the 24-bit DVD-A samples can cover a range of loudness that is much greater than actually required for replaying music. On this basis the DVD-A format is essentially ‘perfect’ in that any limitation should be in our hearing, not the disc! Indeed, so far as I know, on-one has yet managed to produce microphones and studio equipment that can reach the same high levels of performance as DVD-A.

DVD-A can provide various sampling rates with 96 ksamples/sec and 192 ksamples/sec being the most probable choice. This means the recordings can provide an audio bandwidth up to about 48 kHz (for the 96 ksamples/sec rate) or 96 kHz (for the 192 ksamples/sec rate). In both cases this means we can record sounds up to frequencies that are traditionally regarded as being well about what humans can hear. Hence on this basis as well the DVD-A format sound be essentially ‘perfect’.

The format also provides for multichannel surround sound, and in this respect it is similar to the other new formats. (In fact, the original CD-A format specifications also allowed for 4-channel surround, but so far as I know, no-one ever used this and it has become redundant.) DVD-A also allows for much longer uninterrupted playback than CD-A. Hence we can expect to get an entire Opera onto one side of a disc and play it through with no breaks if we wish. By contrast CD-A is limited to about 80 minutes. Although primarily for audio, the DVD-A format also allows for images, video, or text to be included if required.

We can now draw up a table comparing the performance specifications of various DVD-A audio modes with CD-A as a reference benchmark. When looking at the table, bear in mind that there are other audio modes allowed with DVD-A that I have omitted for reasons I’ll explain later.

Format	Bits/sample	samples/sec ( x10³)	Channels	Total bitrate ( x10⁶ bits/sec)	Bandwidth (kHz)	Dynamic Range (dB)
CD-A	16	44·1	2	1·4112	22·05	96·3
DVD-A	24	96·0	2	4·608	48·0	144·5
DVD-A	24	192·0	2	9·216	96·0	144.5
DVD-A	24	96·0	4	9·216	48·0	144.5

From the above we can see that DVD-A can offer much higher technical performance than CD-A, but the cost is a much higher required bitrate.

An aside – Compression and bit-packing

Before going on to DVD-V we need to consider an issue that is vital to understanding how DVD-V’s work. It is also important for DVD-A. This topic is what has come to be called ‘data compression’.

Anyone familiar with the various types of bitmap image file used on computers will be aware that some formats ‘compress’ the image data to use less bits in a file, and hence squeeze more image files or details onto a hard disc, etc. Broadly speaking, these come in two types:

Lossy compression. e.g. JPEGs. These reduce the required file size by discarding information about details which it is felt ‘won’t be missed’. The result is a loss of information about image details but, ideally, only details whose removal won’t be noticed have been discarded.
Loss-free compression. e.g. GIFs or PNGs. These seek to find redundancies or inefficiencies in the ways the data bits record the image information and alter the data record to remove these from the file format without losing any details or actual information.

The same kinds of processes can be applied to audio data. Two audio examples that have been around for some years are the ATRAC (Adaptive TRansform Acoustic Coder) system developed by Sony for their MiniDisc recorders, and PASC (Precision Adaptive Subband Coding) which was employed by Philips for their Digital Compact Cassette recorders.

As with image bitmap files there are variety of ways to do this, but the critical question for each becomes – is it a ‘lossy’ system or not? From a ‘hi fi’ point of view, compression systems that discard data have to be treated with caution. They may have discarded details that mean the sound has been altered. Once the details have been discarded they are ‘lost’ and the damage is unrepairable unless you can go back to and start again from the original source.

DVD-A’s do allow for compressed recording modes. However these use MLP (Meridian Lossless Packing) which is designed to rearrange the information with no loss of any details, but squeeze the result down to require fewer bits. This approach is sometimes called “bit packing”. The details of MLP are complex, so I won’t even try to explain them here. Fortunately the details don’t matter for our current purposes. The key point is that MLP is ‘transparent’ so far as the DVD-A user is concerned. No information is lost in the process and the recovered information is identical to that which was MLP processed. Hence using MLP allows the creator of a DVD-A to squeeze in even more channels than in the above table without having to reduce the sampling rate or number of bits per sample. The PASC and ATRAC systems work in a completely different way to MLP, and discard data in the process of compressing down the number of bits. This distinction becomes important when we consider the audio behaviour of DVD-V...

DVD-V

The DVD-V format is very different to either CD-A or DVD-A as its primary use is for video material with accompanying sounds. Thus its main use is for things like home cinema, and most of the data on the disc is for the images not the sound. The amount of ‘raw’ data required for video is so high that even the generous capacity of a DVD is nowhere near enough for a film two hours long. Hence for DVD-V to work it was essential that the video information should be compressed using ‘lossy’ methods that discard details as well as pack the remaining information as efficiently as possible. Since audio is regarded here as just an accompaniment to the pictures, the common standard is to also compress the audio using ‘lossy’ methods. Although the data rates required for audio are so much lower than for video, this is not always essential.

The audio standards for DVD-V are therefore of three kinds and you may see audio ‘tracks’ of these standards on various discs

LPCM. This consists of stereo (2 channels) of 16 bit samples at a 48 ksamples/second rate. This format is not compressed. In effect it is just like CD-A, but using the slightly higher sampling rate of 48 ksamples/sec to get a 24 kHz bandwidth. On many discs this is just referred to as “PCM Stereo”.
Dolby Digital. This can come in various numbers of channels from mono up to six or more channels for surround, subwoofers, etc. This is a lossy compressed format. On some discs this may be labelled as “AC3”. Dolby Digital is essentially a trade-name for this kind of compression.
Digital Theatre Sound (DTS). As with DD this comes in various flavours, and is a lossy compressed format. DTS is also a trade name.

DD and DTS are not LPCM formats. They store the information in a completely different way to LPCM. However I’ll postpone explaining how the work to a later article.

The official standards for DVD-V require that sound (if present) must be available in either LPCM or DD formats, as the absolute minimum. Other formats can be included if required. A disc can have as many soundtracks and formats as the creators require. Hence some discs have half a dozen tracks in different languages, formats, etc. In my experience, only music DVD-V’s tend to have a LPCM track, and this is often accompanied with a surround DD track, giving the user a choice. Films rarely have a LPCM track and tend to reply on DD. DTS is sometimes added as many people argue it is better than DD.

We can easily compare the DVD-V LPCM format with CD-A and DVD-A as the issue is not complicated by questions regarding the behaviour of lossy compression. The above diagram shows one way of comparing these audio formats in graphical terms. This sort of graphic shows what can be called the “Shannon Space”^[1] that each format can occupy in terms of dynamic and frequency ranges. The broken red line is the threshold of (good!) human hearing. The vertical axis of the graph shows the range of sound power levels, and the horizontal axis shows the frequency range. For the CD-A and DVD-V LPCM examples I’ve assumed the gain is such that the noise/resolution level is set just about at the hearing threshold. It can be seen that DVD-V LPCM is slightly better than CD-A but that DVD-A is vastly better than either of them! (You may find it useful to also consider the article on human hearing that may be in this issue of LwT to help interpret the meaning of the above graphic.)

The above ignores some techniques that are routinely used to improve the perceived performance of CD-A and other LPCM techniques. In the next article I’ll deal with these and explain how non-LPCM formats like DD and SACD work. We will then be able to compare all these formats in terms of the relative audio performances they can offer.

2500 words
Jim Lesurf
28th Apr 2004

^[1] Claude Shannon was the person who first came up with the equation we can use to quantify the capacity of a system to convey ‘information’. So this term is named in his honour.

Back to the Audio Misc Homepage.