FFmpeg volumedetect结果不稳定：不同容器音频音量检测差异咨询

阿华AIGC实验室

2026-5-19

Why FFmpeg Volume Detection Differs Between OGG and MP3, and Which Result to Trust

Great question—this is a super common gotcha when working with compressed audio formats, and it boils down to three key factors:

1. Compression Algorithms Alter the Underlying Audio Signal

OGG (typically paired with the Vorbis codec) and MP3 use fundamentally different lossy compression techniques:

Vorbis relies on a more modern psychoacoustic model that preserves more subtle audio details, especially in high-frequency ranges, compared to the older MP3 standard.
When you convert an OGG to MP3, the MP3 encoder discards audio data its model deems "inaudible." This isn’t just about shrinking file size—it actually modifies the digital waveform of the audio.

FFmpeg’s volumedetect filter calculates volume based on decoded PCM audio data (the raw samples sent to speakers). Since the MP3’s decoded PCM isn’t identical to the OGG’s (thanks to compression losses), their volume measurements will naturally differ.

2. Decoder Defaults Introduce Subtle Processing Differences

FFmpeg uses separate decoders for Vorbis and MP3, each with its own default settings that can tweak the output PCM:

For example, some MP3 decoders apply subtle anti-aliasing filters or gain corrections by default, while Vorbis decoders may not.
Even minor variations in how each decoder handles bitstream edge cases can lead to small but measurable changes in peak and average volume.

3. Metadata-Driven Gain Adjustments (A Less Likely But Possible Factor)

Some audio files include ReplayGain metadata that tells players to adjust volume for consistent playback. If your original OGG had ReplayGain tags, and your MP3 conversion preserved or applied those tags automatically, FFmpeg might factor that gain into its detection (depending on your command flags).

Which Result Should You Trust?

It depends on your goal:

If you care about the original audio’s true volume (from the OGG file you started with), trust the OGG’s -3.0 dB result. The MP3 is a compressed derivative that has already altered the original signal.
If you need to know the volume listeners will actually hear when playing the MP3, trust the MP3’s -3.4 dB result. This is the exact PCM level that will come out of speakers when the MP3 is decoded.

Pro tip: To eliminate decoder variables entirely, decode both files to uncompressed WAV first (using ffmpeg -i input.ogg output.wav and ffmpeg -i input.mp3 output.wav), then run volumedetect on the WAVs. The difference will still exist (since MP3 compression changed the signal), but you’ll rule out any decoder-specific processing quirks.

内容的提问来源于stack exchange，提问作者Eric Stdlib