Audio forensics begins with understanding digital audio at a technical level. This module covers the fundamentals of digital audio representation, frequency-domain analysis, and the tools you need to analyze audio content for signs of synthesis, manipulation, or tampering.
Key takeaway: Audio forensics operates primarily in the frequency domain — analyzing spectrograms rather than waveforms. Understanding sampling rates, bit depth, frequency components, and compression artifacts gives you the foundation to detect voice cloning, audio splicing, and synthesis artifacts.
Digital Audio Basics
Sound is a continuous wave of air pressure variations. Digital audio represents this wave by measuring ("sampling") the pressure level at regular intervals and storing each measurement as a number. Two parameters define the quality of this representation.
Sample Rate
How many times per second the audio is measured. CD quality is 44,100 Hz (44.1 kHz), meaning 44,100 measurements per second. The Nyquist theorem says this captures frequencies up to half the sample rate (22,050 Hz), covering the full range of human hearing.
Common rates: 8 kHz (phone), 16 kHz (voice), 44.1 kHz (CD), 48 kHz (video), 96 kHz (studio)
Bit Depth
How precisely each sample is measured. 16-bit audio has 65,536 possible values per sample, giving ~96 dB of dynamic range. 24-bit has 16.7 million values (~144 dB). Higher bit depth means lower noise floor and more detail in quiet passages.
Common depths: 8-bit (low quality), 16-bit (CD), 24-bit (studio), 32-bit float (processing)
Understanding Spectrograms
A spectrogram is the forensic analyst's primary visualization tool. It displays three dimensions of audio simultaneously: time (horizontal axis), frequency (vertical axis), and amplitude (color intensity). Learning to read spectrograms is the single most important skill in audio forensics.
Spectrograms reveal patterns invisible in waveform view. Human speech shows characteristic formant bands (resonant frequencies of the vocal tract), breathing patterns, lip sounds, and natural background noise. Synthetic speech may lack these natural features or display them with artificial uniformity.
What to Look For in Spectrograms
Natural Speech Indicators
- • Irregular breath patterns between phrases
- • Variable formant transitions
- • Ambient noise floor continuity
- • Micro-hesitations and filler sounds
Synthetic Speech Indicators
- • Perfectly regular formant spacing
- • Absence of natural breathing
- • Clean noise floor (suspiciously quiet)
- • Uniform prosody and pacing
Essential Tools
| Tool | Type | Best For | Cost |
|---|---|---|---|
| Audacity | Audio editor | Spectrogram view, noise analysis, basic forensics | Free |
| Praat | Phonetics analysis | Formant analysis, pitch tracking, voice comparison | Free |
| SoX | CLI audio tool | Batch processing, format conversion, statistics | Free |
| iZotope RX | Pro forensics | Advanced spectral editing, noise profiling, repair | $399+ |
Audio Compression and Artifacts
Understanding audio compression is essential for forensic analysis because compression artifacts can reveal manipulation. Lossy formats (MP3, AAC, OGG) discard frequency data to reduce file size, creating characteristic artifacts. When audio is re-encoded multiple times (generation loss), these artifacts accumulate and become detectable. A splice between two audio segments compressed at different bitrates will show visible boundaries in a spectrogram.
In the next module, Spectral Analysis Basics, you will apply these fundamentals to real-world voice analysis — reading spectrograms for formant patterns, detecting pitch manipulation, and identifying the spectral signatures of different voice synthesis technologies.