BTC640/Sound

From CDOT Wiki
Revision as of 10:14, 12 January 2012 by Andrew (talk | contribs) (Lab)
Jump to: navigation, search

Lecture

Textbook chapter: 4 (though it's rather week on this subject).

Sound is a wave in the air. As any wave it has an amplitude and a frequency. The amplitude controls the loudness and the frequency controls the pitch.

People perceive higher frequency sounds easier than low frequency sounds. So even though the apmlitude is higher that doesn't mean necessarily that the sounds is louder - it also depends on the frequency.

Low frequency sound is also less directional, it's hard to figure out where exactly it's coming from.

Sound frequency is measured in Hertz (Hz). Humans can hear frequencies roughly between 20 Hz and 20k Hz.

You can use Audacity to generate tones of defferent frequencies to get a feel for what they sound like.

Uncompressed Sound

There are various ways of representing waves digitally. Usually the choice is to represent a wave digitally is as a sinus formula with some parameters. This doesn't work very well for sound waves because there are so many changes to the wave in each second.

The choice in an audio CD is to "sample" the wave repeatedly and record each sample as a number. An audio CD and a typical uncompressed digital audio file stores 44100 records for each second of audio.

The number of samples per second is also measured in Hz, so the 44100 above is typically represented as 44.1kHz. This measurement is not related to the frequency/pitch of the sound even though it's the same unit.

44.1kHz was chosen for raw audio because people cannot hear better than half that frequency, thus cannot tell that the reconstructed wave is different from the original. Some claim that they can distinguish a live performance from an Audio CD playback but even such claims are rare.

WAV/AIFF

These two are basically identical formats but given that WAV has been the supported-by-default format in windows - it's much more popular.

Sound is stored completely uncompressed, similar to how it's stored on Audio CDs. The format is not exactly the same though so conversion is necessary to transfer files from/to audio CDs.

Because of their age and lack of encoding/decoding requirement they are usually default formats for audio creation and editing tools.

FLAC

Flac is a lossless compressed format. A wav file can be compressed with any generic compression tool so you can understand Flac playback as on-the-fly decompression, but it's been optimised to allow better audio compression, seeking to random places in the file, and storing audio metadata.

Playing a flac file is more CPU-intensive than playing a WAV file though so which format you choose is a balance between space required and decoding speed.

The speed of the encoder also varies. You can create better compressed Flac files by spending more time encoding them.

A flac file can be converted to any other lossless format such as WAV and back without any loss of audio data.

Compressed Sound

An audio CD contains as much data as a regular data CD - 640 or 700MB. That's used to represent 74 or 80 minutes of sound. That's more than 8MB per minute. Even with today's large harddrives and fast networks there are good reasons to compress sound - storage space is still an issue given enough audio, and for online streaming such sizes are still unacceptable even with high speed internet.

Just as with images there are lossless and lossy compression types.

MP3/OGG

The MP3 format is lossy but cuts down considerably the amount of disk space needed per second of audio. It first became popular when hard drives were small (hard drives had the capacity of an Audio CD) and it was the only accessible way to store musinc on a computer. Later the popularity of the format exploded because of Napster and these days every digital music player supports the format.

The algorithm for the compression is pretty complicated, we won't go though it in this course. Basically it relies on the fact that humans can't hear all sounds in all contexts.

OGG is a format essentially equivalent to MP3 and was developed because of patents surrounding use of MP3. Both OGG decoders (players) and encoders (creating tools) can be developed and distributed without a patent licence. Not all current players support OGG playback. The earlier iPods have been reported to be incapable of playing OGG files simply because their processors were too slow. OGG does require more processing power than MP3 to play.

MP3/OGG files have a bitrate similar to that of WAV files. Older encoders were only capable of generating constant bitrate (CBR) files. Newer encoders can generate variable bitrate (VBR) files where the encoder chooses the bitrate on the fly depending on how much information is in the current sound segment.

Because (1) the encoding process is much more resource-intensive than decoding and (2) the format is lossy - these formats are usually not used for recording or editing but only for distribution.

Other Compressed Audio Formats

There are lots of them out there, see http://en.wikipedia.org/wiki/Comparison_of_audio_codecs for a list. One other worth mentioning here is AAC - a format mostly used by Apple products. It has some benefits over MP3 but is roughly as popular as OGG (both far less used than MP3.

Electronic Sound - MIDI

MIDI is actually much more than just a file format, it's an interchange format that's used in many electronic sound instruments such as synthesizers.

Instead of recording real sound (and often there is no real sound to record in the first place, the music is electronically generated) the audio is stored as a set of instructions, something like sheet music. The instructions include the instrument type, length and amplitude of each note. It is up to the player to combine (like an orchestra) all those instructions into a single song.

MIDI files on computers were much more popular when the internet was slow and the harddrives small because they are tiny compared even to the best compressed audio file. It was common to hear a MIDI playing when visiting an early webpage. These days support for MIDI on computers is spotty because they're not a big requirement and sound cards no longer have MIDI hardware on them like they used to.

Metadata

Most image formats (but not WAV and AIFF) can have metadata associated with them. This usually includes the artist, album, and track name, the year, and genre.

Most playback software can be used to edit the metadata. Format conversion software usually also copies the metadata into the appropriate fields.

Degree Students

Read the 6 page paper Effectiveness of Audio on Screen Captures in Software Application Instruction (Veronicas and Maushak, 2005).

Links

Lab

File:Pond-erosa puff.ogg

audacity gz compress wav, mpr, flac <audio> tags