MP3 Compression Work
In general, compression is used to reduce the storage size of data for purposes like transportation, storage, backups, etc. With current computer power it is not only a matter of space, but of speed as well. Compression can be either lossless or not-lossless.
An example of lossless compression would be computer software. If parts of the original software got lost, the program would not work anymore. A well known example of such compression is to be seen in programs like WinZip.
For Audio and Video, it often doesn't matter to loose a little bit of information. Some examples are MPEG-1, MPEG-2, MP3 and MPEG-4 (DivX). All of these are suitable for both Audio and Video, with the exception of MP3 which can only be used for Audio.
The amount of data needed for a not compressed audio file can become huge. A 3 minute song can exceed 34 Mbytes ... Lets' take a look at a small calculation to give you an idea why this is.
Suppose we want to record a 1 minute song to our computer harddisk. Naturally, you would like to have it in CD quality, so a minimum sample rate of 44.1kHz (=44100Hz) is needed using a 16 bit (2 bytes) format.
44100 Hz means that we sample 44100 values per second form the anaolog digital converter of your soundcard. We need to multiply this by 2, since we want stereo audio (left and right channel). Since we use 2 bytes (16 bit resolution), we need to multply this by 2 once more.
So the size of 1 minute of music
becomes a file of:
44100 samples per seconde X 2 stereo: left and right X 2 bytes per sample X 60 seconds in 1 minute Total: 10.584.000 bytes App. 10 MBytes
Suppose you would like to download this from the Internet, using a 28k8 modem. It would take you 45 minutes download time - for JUST 1 MINUTE! To save this, you would need also 8 disks (1.44 Mb 3.5" diskettes) to store it.
So compression can be pretty usefull. Our 1 minute music could become a 1 Mbyte file using MP3 compression.
There are different (combinable) forms of compression. Let me explain this very simple (take it from me: in practice, this is much more complicated).
For example, if data contains repeating patterns, we should be able to say "repeat pattern". So, say we have this pattern:
we could also write it down as:
10 x ABCCDD
The result is obvious: this is shorter, but still holding the same information. This is compression!
A different technique is converting a pattern to a mathematical formula.
Suppose we have this pattern of values:
0, 1, 4, 9, 16, 25, 36, 49, 64,
81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529,
576, 625, 676
This matches the values 1 t/m 27 for X in the formula:
Y = X^2 (so X to the power 2).
So we write down:
X=1...27, using Y=X^2
Once more: this is shorter too.
I have to admit that this is a very very simple example, but it's the principle that matters.
What we can't hear, what we can't see.
Here we use the boundaries of the human senses. We - humans - can only hear sound in the range of 20 to 20.000 Hertz. CD recordings often hold an even wider range. By removing that extra part, we gain once more. The files becomes smaller,
That's not all. Some very short transitions (and I mean REALLY short onces) can not be detected by the human ear. So we skip those as well, once more the file becomes smaller.
We can do something similar with video: Can you come up with 5 colors? Now tyr 16 colors. Getting hard? Well try 16 Million colors - pretty impossible isn't it? Our vision can be tricked too and does not distinct 16 million colors. So we leave out what can't be seen ....
How does MP3 compression work?
By determining what portions of the audio we wouldn’t hear anyway and removing them, MP3 can compress audio with little apparent impact on the quality of the sound.
The key characteristic of our auditory system that MP3 utilizes is the fact that we can’t distinguish weak sounds when they occur at the same time as a louder sound of nearly the same frequency. This principle is known as auditory masking. A masking effect can be easily experienced using a recording that has some background noise present due to low sampling resolution. During quiet moments or breaks in the music, the noise might be very conspicuous, but when the music is loud the noise can’t be heard.
MP3 uses strong sounds as an opportunity to use fewer bits to represent that portion of the frequency spectrum, even though that raises the noise level. This works because as long as the noise is masked by the strong sound, the listener does not notice much change in the audio quality. Any time fewer bits can be used to represent the audio, greater compression is being performed.
Stronger and weaker sounds occurring simultaneously aren’t the only type of masking effect that can be used by MP3 compression. There are also masking effects before and after strong sounds. In other words, it takes a small amount of time for the brain to process the change in sound level around a strong sound, and this is another opportunity for MP3 to use fewer bits for that portion of the audio.
There can be many masking effects overlapping and interacting with one another in any given moment of audio, and MP3 compression takes these interactions into account. MP3 encoding also considers the fact that the range of human hearing is between 20Hz and 20kHz and that our hearing is most sensitive in the 2-4kHz range of human speech.
The science of figuring out how we perceive sound is known as psychoacoustics. So, the MP3 coding scheme is said to be utilizing a psychoacoustic model when it determines which parts of the audio it can leave out of an MP3 encoded file.
Psychoacoustic methods aren’t the only means of compression used by MP3. Additional compression is achieved by handling stereo information more efficiently than simply allocating half of the bits available to each channel. MP3 can allocate bits between the channels as needed based on the complexity of each (simple stereo mode), or it can encode one channel with the portion of the audio that is identical across both source channels while putting the difference in the second channel (joint stereo mode). Joint stereo mode tends to be most efficient and retains all of the stereo information of the original. MP3 also utilizes a type of data compression called Huffman coding, which replaces commonly repeating patterns with shorter patterns that can be translated back when the files are decoded.