You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何将自研语音识别引擎与Android的android.speech模块集成?

可以!将自研ASR引擎与android.speech模块集成的实现方案

当然可以把你的自研语音识别服务和Android的android.speech模块结合起来,核心思路是自定义一个RecognitionService子类,对接你的服务器API,让系统的SpeechRecognizer可以调用你的服务。下面是具体的实现步骤和关键代码:

1. 自定义RecognitionService

Android的RecognitionService是系统语音识别服务的基类,我们需要继承它并实现核心方法,完成音频采集、上传到你的API、返回识别结果的流程。

关键实现代码

public class CustomASRService extends RecognitionService {
    private MediaRecorder audioRecorder;
    private String audioFilePath;
    private Callback recognitionCallback;
    private ExecutorService executorService = Executors.newSingleThreadExecutor();

    @Override
    protected void onStartListening(Intent recognizerIntent, Callback callback) {
        this.recognitionCallback = callback;
        // 初始化音频录制,生成WAV格式文件
        initAudioRecorder();
        try {
            audioRecorder.start();
        } catch (IllegalStateException e) {
            callback.error(RecognitionService.ERROR_CLIENT, e.getMessage());
            return;
        }
    }

    @Override
    protected void onStopListening(Callback callback) {
        if (audioRecorder != null) {
            audioRecorder.stop();
            audioRecorder.release();
            audioRecorder = null;
            // 异步上传音频到你的API
            executorService.submit(this::uploadAudioToAPI);
        }
    }

    @Override
    protected void onCancel(Callback callback) {
        if (audioRecorder != null) {
            audioRecorder.stop();
            audioRecorder.release();
            audioRecorder = null;
            callback.error(RecognitionService.ERROR_CLIENT, "Recognition cancelled");
        }
    }

    private void initAudioRecorder() {
        audioFilePath = getExternalFilesDir(null) + "/temp_asr.wav";
        audioRecorder = new MediaRecorder();
        audioRecorder.setAudioSource(MediaRecorder.AudioSource.MIC);
        audioRecorder.setOutputFormat(MediaRecorder.OutputFormat.WAV);
        audioRecorder.setAudioEncoder(MediaRecorder.AudioEncoder.PCM_16BIT);
        audioRecorder.setAudioSamplingRate(16000); // 匹配你的API要求的采样率
        audioRecorder.setAudioChannels(1); // 单声道,根据API调整
        audioRecorder.setOutputFile(audioFilePath);
        try {
            audioRecorder.prepare();
        } catch (IOException e) {
            recognitionCallback.error(RecognitionService.ERROR_CLIENT, "Failed to prepare recorder");
        }
    }

    private void uploadAudioToAPI() {
        File audioFile = new File(audioFilePath);
        if (!audioFile.exists()) {
            recognitionCallback.error(RecognitionService.ERROR_CLIENT, "Audio file not found");
            return;
        }

        // 这里用OkHttp实现上传请求(你可以换成自己熟悉的网络库)
        OkHttpClient client = new OkHttpClient();
        RequestBody requestBody = new MultipartBody.Builder()
                .setType(MultipartBody.FORM)
                .addFormDataPart("audio", "temp_asr.wav", RequestBody.create(MediaType.parse("audio/wav"), audioFile))
                .addFormDataPart("language", "zh-CN") // 可从recognizerIntent中动态获取语言参数
                .build();

        Request request = new Request.Builder()
                .url("http://192.168.1.100/ASR/demoSpeechToText")
                .post(requestBody)
                .build();

        try {
            Response response = client.newCall(request).execute();
            if (response.isSuccessful() && response.body() != null) {
                String result = response.body().string();
                // 构造识别结果返回给客户端
                Bundle results = new Bundle();
                ArrayList<String> matches = new ArrayList<>();
                matches.add(result);
                results.putStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION, matches);
                recognitionCallback.results(results);
                recognitionCallback.endOfSpeech();
            } else {
                recognitionCallback.error(RecognitionService.ERROR_SERVER, "API request failed");
            }
        } catch (IOException e) {
            recognitionCallback.error(RecognitionService.ERROR_NETWORK, e.getMessage());
        } finally {
            // 删除临时音频文件
            audioFile.delete();
        }
    }
}

2. 在AndroidManifest中注册服务

必须在清单文件中声明你的自定义服务,并添加对应的Intent Filter,让系统可以发现它是语音识别服务:

<manifest ...>
    <uses-permission android:name="android.permission.RECORD_AUDIO" />
    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" /> <!-- 适配旧版本Android -->

    <application ...>
        <service
            android:name=".CustomASRService"
            android:exported="true">
            <intent-filter>
                <action android:name="android.speech.RecognitionService" />
            </intent-filter>
        </service>
    </application>
</manifest>

3. 在应用中调用自定义ASR服务

现在你可以像使用系统语音识别服务一样,用SpeechRecognizer调用你的自研服务:

private SpeechRecognizer speechRecognizer;
private Intent recognizerIntent;

private void initSpeechRecognizer() {
    speechRecognizer = SpeechRecognizer.createSpeechRecognizer(this);
    speechRecognizer.setRecognitionListener(new RecognitionListener() {
        @Override
        public void onResults(Bundle results) {
            ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
            if (matches != null && !matches.isEmpty()) {
                String recognizedText = matches.get(0);
                // 处理识别结果
            }
        }

        @Override
        public void onError(int error) {
            // 处理错误情况,比如网络异常、录制失败
        }

        // 实现其他RecognitionListener方法(onReadyForSpeech、onBeginningOfSpeech等)
    });

    recognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
    recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
    recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "zh-CN");
}

// 开始识别
public void startRecognition() {
    if (ContextCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO)
            != PackageManager.PERMISSION_GRANTED) {
        ActivityCompat.requestPermissions(this, new String[]{Manifest.permission.RECORD_AUDIO}, 100);
        return;
    }
    speechRecognizer.startListening(recognizerIntent);
}

// 停止识别
public void stopRecognition() {
    speechRecognizer.stopListening();
}

关键注意事项

  • 音频格式匹配:确保你录制的WAV文件参数(采样率、位深、声道数)和你的API要求完全一致,否则识别会失败。
  • 异步处理:网络请求和音频录制都不能在主线程执行,一定要用线程池或协程处理,避免ANR。
  • 权限处理:Android 6.0及以上需要动态申请RECORD_AUDIO权限;如果你的API需要HTTPS,还要注意配置网络安全策略。
  • 错误回调:务必在各种异常场景(录制失败、网络错误、API返回错误)下调用Callback.error()通知客户端,保证用户体验。

内容的提问来源于stack exchange,提问作者AKA

火山引擎 最新活动