如何缩短Python SpeechRecognition库在树莓派4上的麦克风监听时长？

阿华AIGC实验室

2026-4-29

解决SpeechRecognition监听时长过长的问题

看起来你已经找对了方向——listen()函数的timeout和phrase_time_limit参数就是关键，但可能有两个细节没处理好，导致参数没生效，再加上adjust_for_ambient_noise的默认行为也会占用额外时间。我来帮你一步步修正：

1. 纠正`adjust_for_ambient_noise`的用法

你现在的代码里把adjust_for_ambient_noise的返回值赋值给了audio，但这个函数的核心作用是校准麦克风的环境噪音阈值，返回的音频数据其实没用，而且它默认会监听1秒来采集噪音样本，这会额外增加等待时间。你可以缩短校准时长，并且不要覆盖后续的audio变量：

r.adjust_for_ambient_noise(source, duration=0.5)  # 只用0.5秒校准噪音，减少不必要的等待

2. 正确配置`listen()`的参数

你需要明确两个参数的区别，才能精准控制监听时长：

timeout：如果超过这个时长用户还没开始说话，直接停止监听并抛出WaitTimeoutError
phrase_time_limit：用户开始说话后，最多录制指定时长的语音，到点自动停止

如果你希望总等待+录制的时长不超过3秒，应该同时设置这两个参数，并且要捕获超时异常，否则程序会报错中断：

try:
    audio = r.listen(source, timeout=3, phrase_time_limit=3)
except sr.WaitTimeoutError:
    print("3秒内没有检测到语音输入")
    continue  # 回到循环继续等待下一次输入

3. 完整修改后的代码

把这些调整整合到你的代码里，最终版本如下：

import speech_recognition as sr
r = sr.Recognizer()
speech = sr.Microphone(2)
# print(sr.Microphone.list_microphone_names())

while 1:
    with speech as source:
        print("say something!…")
        # 缩短噪音校准时长，避免不必要的等待
        r.adjust_for_ambient_noise(source, duration=0.5)
        try:
            # 设置总超时3秒，说话时长最多3秒
            audio = r.listen(source, timeout=3, phrase_time_limit=3)
        except sr.WaitTimeoutError:
            print("3秒内未检测到语音，重新等待输入...")
            continue
        print("the audio has been recorded")
        # Speech recognition using Google Speech Recognition
        try:
            print("api is enabled")
            recog = r.recognize_google(audio, language='en-US')
            print("You said: " + recog)
        except sr.UnknownValueError:
            print("Google Speech Recognition could not understand audio")
        except sr.RequestError as e:
            print("Could not request results from Google Speech Recognition service; {0}".format(e))

为什么之前的调用没生效？

你之前用r.listen(source,None,3)只设置了phrase_time_limit=3，但timeout=None意味着如果用户不说话，程序会一直等待下去——这就是你觉得监听时长接近10秒的原因，因为它在等用户开口，直到你主动发出声音或者系统触发隐性超时。加上timeout=3后，3秒没声音就会停止等待，完美解决你的问题。

内容的提问来源于stack exchange，提问作者The White Cloud