如何使用Google离线语音识别API?现有Python代码仅支持在线运行
Hey there! The recognize_google() function you're using is tied to Google's cloud-based speech recognition service—so that's why it only works when you're online. To get offline functionality, we can use CMU PocketSphinx, an open-source offline speech engine that integrates seamlessly with the speech_recognition library you're already using.
Step 1: Install Required Dependencies
First, you'll need to install the PocketSphinx package (it's not included with speech_recognition by default):
pip install pocketsphinx
Note: On Linux systems, you might need to install system-level dependencies first, like libpulse-dev (for Ubuntu/Debian: sudo apt-get install libpulse-dev).
Step 2: Modified Offline Code
Here's your updated code that uses offline recognition instead of Google's online API. I've added error handling too, which helps catch cases where speech can't be recognized:
import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: # Adjust for ambient noise first (boosts recognition accuracy) r.adjust_for_ambient_noise(source, duration=0.5) print('Say Something!') audio = r.listen(source) print('Done!') try: # Use PocketSphinx for offline speech recognition text = r.recognize_sphinx(audio) print(f"Recognized: {text}") except sr.UnknownValueError: print("Sorry, I couldn't understand what you said.") except sr.RequestError as e: print(f"Error with the offline recognition engine: {e}")
Key Notes for Offline Usage
- Accuracy: Offline recognition won't match the precision of Google's online service, especially in noisy environments or with less common vocabulary.
- Language Support: By default, PocketSphinx uses English. To use Chinese (or other languages), specify the language code in the
recognize_sphinxcall. For example:
Most distributions include common language models by default, but if you run into issues, you can download the specific model files from PocketSphinx's official resources.text = r.recognize_sphinx(audio, language='zh-CN') - Model Tweaks: For better accuracy tailored to your use case, you can customize acoustic/language models, but that requires more advanced setup.
内容的提问来源于stack exchange,提问作者nadhem




