What is Digital Audio?

Disclaimer:
This section was copied from the english Floss Manuals, and adapted to this wiki format, note that the original floss manuals were more intended to be a manual for Pd-Extended, which is already outdated, but many parts of the tutorial are still beginner friendly and will be updated in this wiki.

if you want to contribute, feel free to edit this page or suggest changes in discord group #ideasAboutPdBook.

What is digital audio?


Since we'll be using Pure Data to create sound, and since Pd treats sound as just another set of numbers, it might be useful to review how digital audio works. We will return to these concepts in the audio tutorial later on.

How Sound is Generated




First, imagine a loudspeaker. It moves the air in front of it and makes a sound. The membrane of the speaker must vibrate from it's center position (at rest) backwards and forwards to be able to create this sound.

A sound is basically a pressure wave that travels in the air and it is generated when the movement of the speaker forces the air molecules to compress (when they are pushed by the membrane) and decompress (when the membrane goes back and the air molecules have to fill in the space left by the membrane). The sound propagates as a longitudinal wave, as opposed as a transversal wave (such as the waves on the water surface).

The way the membrane vibrates generates what we call the waveform or waveshape of that sound. It may be very complex, such as pink noise or very simple, like a sine wave.
 * Alert of Loud Sound!, lower your speakers volume if you want to hear the pink noise example on the link of this paragraph



A microphone works in reverse - vibrations in the air cause its membrane to vibrate. The microphone turns these acoustic vibrations into an electrical current. If you plug this microphone into your computer's soundcard and start recording, the soundcard makes thousands of measurements of this electric current per second and records them as numbers.

Frequency and Amplitude
If our speaker produces a periodic wave (such as the one of a sine wave, for example), the sound produced can be perceived as a pitch if our speakers vibrate fast enough. The number of times per second it vibrates makes what we call the frequency of the sound (the note, tone or pitch), and conversely the time it takes to make a complete cycle is named a period of the wave.

The distance it travels from it's resting point determines the amplitude of the wave. We perceive this travel from the average maximum to average minimum distance over some time as the loudness of the sound, which is also frequency dependent (we perceive some frequencies louder than others).

Normally, we measure frequency in Hertz (Hz) and loudness or gain in Decibels (dB).

Actually this is a simplification, since when the waveform is more complex we can perceive more than one pitch present, and an actual sound may actually be composed of many sinusoids of different frequencies and amplitudes, which we call partials.

Sampling Rate and Bit Depth


To make audio playable on a Compact Disc, the computer must make 44,100 measurements (called samples) per second, and record each one as a 16-bit number. One bit is a piece of information which is either 0 or 1, and if there are 16 bits together to make one sample then there are 216 (or 2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2 = 65,536) possible values that each sample could have. Thus, we can say that CD-quality audio has a sampling rate of 44,100 Hz and a bit-depth or word length of 16 bits. In contrast, professional music recordings are usually made at 24-bit first to preserve the highest amount of detail before being mixed down to 16-bit for CD, and older computer games were famous for having a distinctively rough 8-bit sound. By increasing the sampling rate, we are able to record higher sonic frequencies, and by increasing the bit-depth or word length we are able to use a greater dynamic range (the difference between the quietest and the loudest sounds it is possible to record and play).

The number we use to record each sample has a value between -1 and +1, which would represent the greatest range of movement of our theoretical loudspeaker, with 0 representing the speaker at rest in the middle position.



When we ask Pd to play back this sound, it will read the samples back and send them to the soundcard. The soundcard then converts these numbers to an electrical current which causes the loudspeaker to vibrate the air in front of it and make a sound we can hear.