3 - What is a wave file? | MIDI tutorials

3 - What is a wave file? Jul 5, 2011 13:50:43 GMT

Quote

Post by JohnG on Jul 5, 2011 13:50:43 GMT

Article 3 What is a wave file?

Next in the series of the first three articles, I'm going to describe what a wave file (filename.wav) is and how it differs from a MIDI file. Again I'm going for clarity before precision.

What is a wave file? Again put as simply as I can, a wave file contains a series of numbers (thousands and thousands of numbers) that represent, pretty accurately, a sound, a series of sounds, someone talking, a song, a piece of music, etc. An audio CD contains a number of wave files.

Now, I imagine that doesn't help many of you very much, so let me try and amplify upon it. However, to do so I'm going to have to go into a little bit of technical detail but, I hope, not too much. What I do need to talk about, as simply as I can, is a process known as "sampling".

So when we talk about sampling in the context of audio what do we mean? We mean taking a measurement of the amplitude (electrical loudness) of a waveform (maybe the input from a microphone) at a split second in time and then representing that in a computer (or other digital instrument) as a number. Then taking another measurement a fraction of a second later (turn that into another number) and again and again and so on. If we continue to do this for a minute or so and then we get the computer to write the data it's stored to a file, we have just created a wave file. In the same way we can instruct the computer to sample and save the sound coming out of a sound module or a keyboard. So what the computer has stored is a series of numbers that follows the waveform up and down. If we pass this file to another device it should be able to play it back (convert it from digital back to an analogue wave) and we will hear the original music. What I've just described, in very simple terms, is how an audio CD is made.

We probably don't realize it but we all use this technology every day when we make a telephone call. Somewhere in the circuit, usually at the telephone exchange (or c/o for the Americans amongst us), there's an analogue to digital (A/D) converter or ADC. In this case it samples our voice (the electrical signal coming down the line from the microphone in our telephone) at 8,000 times a second (i.e. the ADC takes 8,000 measurements every second) and uses 1 Byte (8 bits) to store each sample (or measurement). It then sends these Bytes across the network to the exchange (or c/o) nearest to the person we're calling and there the digital to analogue converter (DAC) converts the Bytes back to an analogue signal which it sends down the line to the person we're calling. So our voice is flowing through the telephone network (8 bits x 8,000 samples) at 64,000 bits per second. And it's mono not stereo. This is the international standard for digital telephony.

So far, so good?

For creating a CD clearly there must be some sort of international standard that defines how frequently we must sample the analogue signal and how many Bytes to use to represent the sample that we've taken so that it accurately represents the original sound. For CD quality music we must measure the signal 44,100 times per second and each measurement is stored using 2 Bytes (16 bits) of computer memory. That allows us to divide our original signal into 65,536 different levels. Actually 32,767 different levels +ve and the same number -ve and zero.

Now for some arithmetic.

Sample at 44,100 times per second, each sample is 2 Bytes and this time it's stereo (left and right channels) so x 2. 44,100 x 2 x 2 = 176,400 Bytes per second or x 60 seconds = 10,584,000 Bytes per minute. That's right, ten and a half million Bytes per minute! That means we can store a little more than 70 minutes of high quality audio on a CD with its capacity of over 700 MegaBytes.

Q. So how does a MIDI file differ from a wave file?
A. A MIDI file holds a series of instructions that tell a computer or sound module or keyboard what note to play. How loud, what length, which voice and so on. A wave file, on the other hand, is the sound itself but stored in a digitized format.

Q. Can I convert from one format to the other?
A. Strictly speaking, no, not convert. But I can tell my MIDI file player to play the MIDI file then record the resulting sound output as a wave file and the result should be fine depending upon the quality of the sound module that I use. (I'd call that a process not a conversion but maybe I'm splitting hairs.)

Q. What about the other way?
A. Well, how do I convert the sound from a CD into instructions for a keyboard, especially when there may be lots of instruments playing at the same time? Pretty nearly impossible I'd say. However, my software sequencer has a mechanism called "voice to score". It allows one voice (human or instrument) to be converted to MIDI notes using a microphone input. Using this it is possible, gradually to build up a MIDI file but with quite a bit of human intervention. (Again, I'd call this a process not a conversion.)

Q. You've mentioned two standards for sampling, are there more?
A. Yes, in fact lots. It depends what the medium is that you're sampling for, i.e. telephone, AM radio, FM radio, CD, etc. Two decisions need to be made. The first is what is the frequency range of the audio signal and the second is the dynamic range (the difference between the quietest and the loudest sounds). These two dictate first the sampling rate and second the size (the number of bits or Bytes) of the sample. For the telephone I don't need a wide frequency range nor a huge difference between loud and soft, whereas for a CD I do. Clearly if I'm creating a sound for AM or FM radio system it's somewhere in between.

Q. People mention audio cards and 24/96 in the same breath. What is that?
A. If you're recording sounds that you will eventually mix together to form a CD then the quality of the original needs to be better before you start to process it with digital effects etc. So a high quality audio card of today will sample at 96,000 times per second and use 3 Bytes (24 bits) to store each sample. In the final mix it will be converted down to 16/44 for writing to CD.

If you're feeling very brave you might explore this page:
en.wikipedia.org/wiki/Sampling_%28signal_processing%29

Looking back over this article I wonder whether it's too much!
Feedback and questions of clarification are always welcome.

© John L. Garside, 2007.

Hi Michael,

Thanks for your enquiry. I thought when I tackled this one it might provoke a few more questions. I'll try to come up with some answers that may make sense to you.

Q. Why is the sample rate of say 44.1k samples per second described as 44.1 Kilohertz--what is the hertz bit?

A. Well, strictly speaking, we shouldn't refer to sampling using the term Hz! The unit of Hertz is really there to define the frequency of wave forms like sound or radio waves etc. Sadly, like many technical terms, it has been picked up and used by people who may not have the scientific or engineering background to understand the precise meaning. Another term, Baud, has suffered the same fate.

But to answer your real question, Hertz is a measurement of frequency i.e. cycles per second. Just like many other measurements it is named after a notable scientist from the past. In this case a German physicist called Heinrich Rudolf Hertz who made important contributions in the field of Electromagnetism. We use amps, for electrical current, named after André-Marie Ampère, this time a French physicist and volts named after an Italian physicist Alessandro Volta. Most of these people are from the 18th or 19th century.
Try this reference: en.wikipedia.org/wiki/Sound

Q. Also if using a bit rate of say 16 bits, this seems to be equal to 2 bytes. Thus are there 8 bits in one byte?

A. Yes, quite right, 8 bits to the Byte. I use capital letters to distinguish one from the other when abbreviated. So 8kbps is kilobits per second but 8KB is 8 kilo Bytes (maybe the size of a small MIDI file).
Here's another Reference: en.wikipedia.org/wiki/Bytes

Q. Also, I assume the bit rate refers to the accuracy with which each sample is recorded in binary number format (ones and zeroes), thus the higher the bit rate the longer the binary number and the more accurate the sample and the greater the fidelity when converted back to analogue/sound---is this correct?

A. Mmmm. Not quite. First "bit rate" means transmission speed i.e. bits per second. It refers, in this case, to the rate at which data will have to be written to the hard disk. It comes from multiplying the frequency of sampling (samples/sec.) and the size of each sample (1, 2 or 3 Bytes) expressed in bits (8, 16 or 24). So if I sample at 44.1K samples/second and take 2 bytes (16 bits) per sample I get a bit rate of 44,100 x 16 = 705,600 bps (or bits per second). If I increase the size of the sample to 3 bytes then I get 1,0584,000 bps. Similarly increasing or decreasing the frequency of sampling will alter the bit rate up or down.

So really we should stick with samples per second, in this case, because we need to use Hz and KHz to describe the upper and lower limits of the frequency of the sound wave that we are sampling.

Increasing the frequency of sampling allows us to follow the wave more accurately in time, increasing the number of bits per sample allows us to follow the wave more accurately in amplitude. Have a look at the the diagrams on this page:

en.wikipedia.org/wiki/Discrete-time.
They show how a wave is converted and the resultant bit map.

Imagine that we increased the number of vertical arrows (in the upper diagram) so that there's one more arrow between each pair of existing arrows. We've doubled the sample rate. The wave has now been represented more accurately in time but we're still only (in this example) using one byte per sample. Now imagine we were to increase the the horizontal dotted lines (on the lower diagram) fitting an extra one between each pair. We've doubled the size of each sample (2 Bytes) and each tiny increment (or decrement) in the wave is more accurately recorded.

The corollary of this is that we can also record higher frequencies by increasing the sample rate and a greater dynamic range by increasing the sample size.

Does this help? I do hope I haven't just confused you more. Please do ask.

Q. Finally, I'm not sure I understand your reference above to +ve and -ve?
A. Sorry, I did wonder whether to spell it in full. +ve is electrical positive and -ve is electrical negative. The terminals on a battery are often marked + and -. Sound waves fluctuate up and down above and below zero volts.

Keep the questions coming,
Best regards,
JohnG.

MIDI tutorials by JohnG aka SysExJohn

3 - What is a wave file?

Post by JohnG on Jul 5, 2011 13:50:43 GMT