To accurately capture all frequencies in a signal, the sampling rate must exceed twice the highest frequency:
where
Maximum capturable frequency (Nyquist frequency):
→ Signal frequencies exceeding half the sample rate cause aliasing.
Undersampling (top) vs. correct sampling (bottom) of a sine wave
When sampling a signal, frequencies above the Nyquist frequency are reflected back into the audible range, creating unwanted artifacts.
Example:
30 kHz tone sampled at 44.1 kHz (Nyquist = 22.05 kHz) appears as 14.1 kHz, which is the difference between the frequency being sampled and the Nyquist frequency.
The tone is mirrored to the Nyquist frequency and folded back into the useful spectrum.
Sine sweep exceeding Nyquist frequency is mirrored back into the audible range.
Low-pass filter is located before the ADC, in the analog domain that attenuates frequencies above Nyquist to prevent them from folding back into the desired signal band.
Ideal brick-wall filter:
Real analog filters:
Sample rate must exceed the Nyquist frequency (2× the highest frequency) by sufficient margin to accommodate filter slope.
Sample values might be at 0 dBFS, but the reconstructed waveform between (inter-sample) them can exceed this, potentially causing clipping or distortion.
→ Provide a headroom buffer of 1 to 2 dBFS during mastering/export
Inter-sample peak
Oversampling processes audio at a higher internal sample rate than the project rate.
Besides the time-domain sampling, the second important step to digitize a signal is amplitude-domain quantization:
Each sample is rounded to the nearest amplitude value set by the bit depth, introducing small quantization errors. (Quantization and bit depth)
Digital systems can only represent numbers with finite (limited) precision.
Sampling requires mapping each sample to the nearest value within a finite set of amplitude levels.
Examples (from left to right):
1-bit quantization (2 levels)
2-bit quantization (4 levels)
3-bit quantization (8 levels)
4-bit quantization (16 levels)
8-bit quantization (256 levels)
Quantization error
The difference between the actual amplitude (blue) and the quantized value (stepped red line) is the quantization error (green).
A DAC transforms digital signals (binary data) back into continuous analog waveforms.
Quantization creates systematic rounding errors that produce audible distortion at low signal levels.
Dither adds very low-level noise before quantization to randomize these errors, transforming harsh distortion into low background noise.
→ Always dither when exporting to lower bit depth to preserve low-level detail.
Dynamic range (DR): theoretical maximum determined by bit depth calculation
→ 24-bit audio enables a 146 dB dynamic range, corresponding to the span from whisper (minimum) to jet engine at close range (maximum).
| ## bits | SNR (Audio) | Minimum amplitude step (dB) | possible values per sample |
|---|---|---|---|
| 8 | 49.93 dB | 0.1948 dB | 256 |
| 16 | 98.09 dB | 0.00598 dB | 65,536 |
| 24 | 146.26 dB | 0.00000871 dB | 16,777,216 |
| 32 | 194.42 dB | 0.0000000452 dB | 4,294,967,296 |
→ Dynamic range of humans: threshold of hearing to threshold of pain ≈ 120 dB
Clipping is a change of the waveform due to electronic or digital limitations.
Hard clipping vs. soft clipping
Fixed Point (16/24-bit):
Floating Point (32/64-bit):
→ DAWs usually process at 32-bit float.
Bit rate (amount of data per second):
File size (total data for duration):
Example: 1 minute stereo, 48 kHz, 24-bit
= (48,000 × 24 × 2 × 60) / 8 = 8,640,000 bytes ≈ 8.64 MB
for a one minute long file:
| Channels | Sample Rate (kHz) | Bit Depth | File Size (MB) |
|---|---|---|---|
| 1 | 44.1 | 16 | 5.29 MB |
| 1 | 44.1 | 24 | 7.94 MB |
| 1 | 48 | 24 | 8.64 MB |
| 1 | 48 | 32 float | 11.52 MB |
| 1 | 96 | 24 | 17.28 MB |
| 1 | 96 | 32 float | 23.04 MB |
| 2 | 48 | 24 | 17.28 MB |
DC offset occurs when a waveform has a non-zero average value, shifting the entire signal away from the zero line.
Waveform with DC offset
→ Apply DC offset removal / high-pass filter (e.g., 20 Hz).
Every project defines two key parameters that determine audio quality and file size:
→ Recommended setting: 48 kHz / 24-bit
Latency is the time between an audio signal entering and leaving the system.
Main causes:
Some latency is unavoidable (conversion, processing) while buffer latency is adjustable.
Digital audio systems use buffers (small sections of temporary memory) to process audio in blocks. The buffer size affects both latency (delay) and system stability.
Buffer size defines how many audio samples the system processes at once and directly affects latency and stability.
Small buffers (64–128 samples): Low latency for recording and live monitoring, higher CPU load, risk of dropouts
Large buffers (512–1024 samples): Higher latency, lower CPU load, stable playback for mixing
→ Use the smallest buffer size that avoids dropouts for the given task.
Example:
Buffer = 128 samples at 48 kHz:
Delay = (128 / 48000) × 1000 = 2.67 ms
humans can detect a silent gap between two sounds of about 2–3 ms.
If sounds are less similar, or in noise / lower intensity, or onsets with less pronounced attack phase, threshold increases (≥ 4-5 ms).
→ Buffer settings around 128 samples (≈3ms at 48kHz) feel immediate to most musicians during recording
| Buffer Size in samples | Delay in ms for 44.1kHz | Delay in ms for 48kHz |
|---|---|---|
| 32 | 0.72 | 0.66 |
| 64 | 1.45 | 1.33 |
| 128 | 2.9 | 2.6 |
| 256 | 5.8 | 5.3 |
| 512 | 11.6 | 10.6 |
| 1024 | 23.2 | 21.3 |
| 2048 | 45.9 | 42.1 |
Well-optimized digital systems introduce less latency than the physical distance between performers and their monitoring systems (speed of sound ≈343 m/s in air).
Acoustic propagation delay:
→ Digital audio latency (3–10 ms) is comparable to or shorter than acoustic delays musicians naturally encounter.
Pulse-Code Modulation (PCM):
Analog signal amplitude is sampled at uniform intervals and each sample is quantized to the nearest digital step.
→ PCM = sampling + quantization.
Uncompressed (PCM)
Lossless compression
Lossy compression
Word clock is used when multiple digital devices (interface and converters) are connected:
Jitter is unwanted timing variation in the digital audio clock, causing samples to be processed at incorrect times and potentially introducing distortion.
Synchronization aligns multiple devices to a common clock to minimize jitter and ensure stable audio transfer.
Jitter
Original content: © 2025 Lorenz Schwarz
Licensed under CC BY 4.0. Attribution required for all reuse.
Includes: text, diagrams, illustrations, photos, videos, and audio.
Third-party materials: Copyright respective owners, educational use.
Contact: lschwarz@hfg-karlsruhe.de