Sound file sizes and the quality of playback
Introduction
A number of factors can affect the size of a sound file and the quality of its playback. These include the sample rate, sample resolution and the way that the file is compressed.
Recording sound digitally and playing it back
When you speak or play music into a microphone, the microphone takes the sound waves and converts them into a voltage. As the sound waves vary, so the voltage varies. The microphone is connected to a computer's sound card. This card samples the microphone's voltage at intervals. We can show a sample of some sound on a graph. The sound we are passing to the microphone is constantly changing so it is known as an 'analogue signal'.
Lets do what the sound card does and sample the voltage. We will start by doing this at 1 second intervals, starting at 0 seconds. When a sample has been taken, we'll convert the reading into a 'binary signal'.
We have now taken some sound and 'digitalised' it. Our file looks like this: 0000 1000 0101 1011 1100 0101 1010 1000 1000 1001 0010
We can now save our digital file and play it back whenever we want to. We do that by taking each sample of data and outputting it, via our sound card, to the speakers. The sound card takes the digital signal and does whatever conversion is necessary so the speakers can play the sound. We can 'hear' what this might sound like by plotting our binary data back on the graph. We will plot our data on top of the original sound file so we can compare them.
The red line represents the digital file. In an ideal world, the red line should sit directly on top of the blue line, which represents the original sound we recorded. If the red line did sit on the blue line, then the sound that was played back would be the same that was recorded. However, it's not, so the sound won't be exactly the same as the original. The reason for this is to do with our 'sample interval'. Taking a reading every second gave us a small file but the quality was poor. If we halved the interval between readings, we would double the file size but hopefully improve the playback quality. Let's take a reading every half a second instead of every second. Here is the data:
And here is the data on top of the original sound file:
This is a big improvement on what we had before, although the file size has now doubled. We could reduce the interval between samples again, perhaps taking a reading every 0.25s, and this would no doubt improve the recording even more, making it even closer to the original. The price, of course, would be an even bigger file size. You may see in any recording software you use the phrase 'sample rate'. This is simply how many times we take a reading in a second. It's just a different way of looking at the sample interval.
The next time you 'rip' a CD of music, to turn it into a set of MP3 recordings, and you have to select the quality of the sound, you will understand what is going on. You are simply selecting the sample rate. You will know that you can get really good, faithful recordings of the original songs, but the file sizes will increase as the quality improves. This is important because an MP3 player or your phone will have a fixed amount of storage space and you don't want it all used up by three songs! The trick is to find a balance between playback quality and file size.
Sound compression and codecs
It takes a lot of data to store a sound file. The sound files themselves can be very large. This can be a problem if you have limited storage space on a storage device or you want to listen to songs over the Internet by listening to 'streamed' music. Streamed music is where you listen to the music as it is downloaded from the Internet as opposed to downloading and saving the whole song first and then playing it. As the first few seconds of a streamed song is downloaded, it is put into a special storage area called a 'buffer' and then playing starts. While that part of the song is played, more of the song is being downloaded and buffered. If the files are too large, however, then you can't download it quickly enough for it to be buffered and played. The song stops for a little while, to give your computer the time to download more of the song before playing can resume.
To get round this problem, they are usually 'compressed'. That means that the raw sound file has a maths formula applied to it (called a codec, or coder / decoder) so that the raw file gets squashed. The file becomes much smaller in size so it can be downloaded and streamed more quickly.
Lossy and lossless compression
Files can be compressed using different types of codecs. One type of codec is called 'lossy compression'. The maths formula for these seek to remove the parts of a sound file which are deemed relatively unimportant, for example, sound samples that are out of the range of the human ear. Some popular lossy codecs include MP3 and OGG (Ogg Vorbis). Another type of codec is called 'lossless compression'. These codecs keep all of the information about a sound file. The file format WAV is an example of a lossless codec. Lossless codecs give you a smaller file than the raw sound file but give you a very high quality sound. Lossy codecs, on the other hand, give you a much smaller sound file but with some loss of quality (although most people would struggle to notice the difference).