Python实现内置麦克风音频Ring Buffer：持续存储最近2秒音频用于回调函数

阿华AIGC实验室

2026-5-8

Hey there! Let's break this down step by step since you're new to Python and want to build a ring buffer for capturing the last 2 seconds of mic audio, plus wrap your head around how callbacks fit into this.

First: What's a Callback Function, Anyway?

A callback is just a function you define that gets automatically called by another piece of code when a specific event happens. For your audio use case: when the microphone finishes capturing a small chunk of audio data, the audio library you use will trigger your callback function and pass that new chunk to it. This is perfect for your goal—you can use the callback to continuously add new audio data to your ring buffer.

Second: Building a Ring Buffer for Audio

A ring buffer (or circular buffer) is a fixed-size storage space that overwrites the oldest data when it runs out of room. This lets you always keep the most recent 2 seconds of audio without using infinite memory. Here's how to build one tailored for audio:

Step 1: Calculate the Buffer Size

First, you need to know two key audio parameters:

Sample rate: How many audio samples are captured per second (common values are 44100 Hz or 48000 Hz)
Channels: 1 for mono, 2 for stereo

For 2 seconds of audio, the total number of samples is:
total_samples = sample_rate * channels * 2

For example, 44100 Hz mono: 44100 * 1 * 2 = 88200 samples.

Step 2: Implement the Ring Buffer Class

Here's a simple, thread-safe ring buffer (we add a lock because audio callbacks often run in a separate thread):

import numpy as np
from threading import Lock

class AudioRingBuffer:
    def __init__(self, sample_rate, channels, duration_sec=2):
        self.sample_rate = sample_rate
        self.channels = channels
        self.duration_sec = duration_sec
        # Calculate total number of samples the buffer needs to hold
        self.total_samples = int(sample_rate * channels * duration_sec)
        # Initialize buffer with zeros (numpy array is perfect for audio data)
        self.buffer = np.zeros((self.total_samples, channels), dtype=np.float32)
        # Pointer to track where the next chunk of data will be written
        self.write_ptr = 0
        # Lock to prevent race conditions between callback and read operations
        self.lock = Lock()

    def add_data(self, new_data):
        """Add a new chunk of audio data to the buffer"""
        with self.lock:
            new_samples = new_data.shape[0]
            # If adding this chunk would exceed the buffer, wrap around
            if self.write_ptr + new_samples > self.total_samples:
                # Split the data into two parts: what fits at the end, and what wraps to the start
                part1 = self.total_samples - self.write_ptr
                part2 = new_samples - part1
                self.buffer[self.write_ptr:] = new_data[:part1]
                self.buffer[:part2] = new_data[part1:]
                self.write_ptr = part2
            else:
                # Fit the whole chunk in one go
                self.buffer[self.write_ptr:self.write_ptr+new_samples] = new_data
                self.write_ptr += new_samples

    def get_recent_audio(self):
        """Retrieve the most recent 'duration_sec' seconds of audio"""
        with self.lock:
            # If write_ptr hasn't reached the end yet, return the first write_ptr samples
            if self.write_ptr < self.total_samples:
                return self.buffer[:self.write_ptr].copy()
            else:
                # When buffer is full, return data starting from write_ptr to end, then start to write_ptr
                return np.concatenate([self.buffer[self.write_ptr:], self.buffer[:self.write_ptr]]).copy()

Third: Connect the Buffer to Audio Capture with a Callback

We'll use the sounddevice library (it's beginner-friendly for audio I/O). First install it:

pip install sounddevice numpy

Now, here's a complete example that captures mic audio, feeds it to the ring buffer via a callback, and lets you retrieve the last 2 seconds whenever you need:

import sounddevice as sd
import sys

# Configure audio parameters
SAMPLE_RATE = 44100
CHANNELS = 1

# Initialize our ring buffer for 2 seconds of audio
ring_buffer = AudioRingBuffer(SAMPLE_RATE, CHANNELS, duration_sec=2)

def audio_callback(indata, frames, time, status):
    """This is the callback function that sounddevice will call with new audio data"""
    if status:
        print(status, file=sys.stderr)
    # Add the new audio chunk to our ring buffer
    ring_buffer.add_data(indata)

# Start the audio stream with our callback
with sd.InputStream(samplerate=SAMPLE_RATE, channels=CHANNELS, callback=audio_callback):
    print("Capturing audio... Press Enter to get the last 2 seconds of audio")
    input()
    # Retrieve the most recent 2 seconds
    recent_audio = ring_buffer.get_recent_audio()
    print(f"Retrieved {len(recent_audio)/SAMPLE_RATE} seconds of audio")
    
    # Here's where you could pass 'recent_audio' to your target callback function!
    # For example: your_callback_function(recent_audio)

How This Works:

The sd.InputStream starts capturing audio from your mic.
Every time a small chunk of audio is ready, it calls audio_callback and passes the chunk in indata.
The callback adds this chunk to the ring buffer.
When you press Enter, we fetch the last 2 seconds from the buffer and can use it however we need (like passing it to your target callback function).

If You Have Existing Code...

If you're working with a codebase you don't fully understand yet, feel free to share snippets of it! We can adapt this ring buffer and callback logic to fit what you're already working with.

内容的提问来源于stack exchange，提问作者Sparky