Plivo <Stream>+ElevenLabs对话AI WebSocket双向流通话无音频播放问题排查求助
Plivo +ElevenLabs对话AI WebSocket双向流通话无音频播放问题排查求助
大家好,我正在搭建一个外呼系统,用Plivo发起通话,通过WebSocket把双向音频流转给ElevenLabs的对话AI,预期实现来电者和AI实时对话的功能。技术栈是Node.js + Fastify + @fastify/websocket + Plivo SDK,流程应该是:来电者说话→音频传给ElevenLabs→AI生成回复音频→音频回传给Plivo播放给来电者。
目前通话能正常发起,Plivo和ElevenLabs的WebSocket连接也能建立,日志里能看到ElevenLabs返回了audio事件,但来电者就是听不到AI的回复音频。我已经检查了基本配置,但还是找不到问题所在,想请各位帮忙看看代码哪里可能出问题了。
以下是我的完整代码:
const fastify = require('fastify')(); const websocketPlugin = require('@fastify/websocket'); const plivo = require('plivo'); const { promisify } = require('util'); require('dotenv').config(); // 环境变量 const { PLIVO_AUTH_ID, PLIVO_AUTH_TOKEN, PLIVO_PHONE_NUMBER, ELEVENLABS_AGENT_ID } = process.env; // 初始化Plivo客户端 const plivoClient = new plivo.Client(); // 注册WebSocket插件 fastify.register(websocketPlugin); // 发起外呼的接口 fastify.post('/make-outbound-call', async (request, reply) => { const toNumber = request.body.to; if (!toNumber) { reply.code(400).send({ error: '请求体缺少"to"号码' }); return; } try { const answerUrl = `${request.protocol}://${request.hostname}/plivo/answer`; await plivoClient.calls.create( PLIVO_PHONE_NUMBER, toNumber, answerUrl, { answer_method: 'GET' } ); reply.code(200).send({ message: '呼叫已发起' }); } catch (err) { console.error('发起Plivo呼叫出错:', err); reply.code(500).send({ error: '发起呼叫失败' }); } }); // 返回Plivo XML,开启双向流 fastify.get('/plivo/answer', (request, reply) => { const wsUrl = `${request.protocol === 'https' ? 'wss' : 'ws'}://${request.hostname}/outbound-media-stream`; const responseXml = ` <Response> <Stream bidirectional="true" keepCallAlive="true" contentType="audio/x-mulaw;rate=8000"> ${wsUrl} </Stream> </Response>`; reply.header('Content-Type', 'application/xml').send(responseXml.trim()); }); // 处理Plivo和ElevenLabs之间的WebSocket双向流 fastify.get('/outbound-media-stream', { websocket: true }, (connection, req) => { let streamId = null; // 连接ElevenLabs对话AI的WebSocket const ElevenLabsUrl = `wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${ELEVENLABS_AGENT_ID}`; const ElevenWs = new require('ws')(ElevenLabsUrl); // ElevenLabs连接打开后发送初始化事件,指定音频格式 ElevenWs.on('open', () => { console.log('ElevenLabs WebSocket连接已打开'); const initEvent = { type: "conversation_initiation_client_data", audio_output_format: { sample_rate: 8000, encoding: "ULAW", bits_per_sample: 8 } }; console.log('向ElevenLabs发送初始化事件:', JSON.stringify(initEvent, null, 2)); ElevenWs.send(JSON.stringify(initEvent)); }); // 处理ElevenLabs的消息 ElevenWs.on('message', (data) => { let event; try { event = JSON.parse(data); } catch (parseErr) { console.error('ElevenLabs返回的JSON格式无效:', parseErr); return; } // 处理ping心跳 if (event.type === 'ping') { ElevenWs.send(JSON.stringify({ type: 'pong', event_id: event.ping_event.event_id })); return; } // 把AI返回的音频转发给Plivo播放 if (event.type === 'audio') { const base64Audio = event.audio_event.audio_base_64; const plivoPlay = JSON.stringify({ event: "playAudio", media: { contentType: "audio/x-mulaw", sampleRate: 8000, payload: base64Audio } }); connection.socket.send(plivoPlay); console.log('已向Plivo发送AI音频'); } // 处理AI的中断信号,清空Plivo的音频缓冲区 if (event.type === 'interruption') { if (streamId) { const clearEvent = JSON.stringify({ event: "clearAudio", streamId: streamId }); connection.socket.send(clearEvent); } } }); // 处理Plivo发来的消息(事件和音频) connection.socket.on('message', (message) => { let msg; try { msg = JSON.parse(message.toString()); } catch (err) { console.error('Plivo返回的JSON格式无效:', err); return; } // 记录Plivo的streamId,用于后续清空音频 if (msg.event === 'start') { streamId = msg.start.streamId; } // 把来电者的音频转发给ElevenLabs if (msg.event === 'media' && msg.media && msg.media.track === 'inbound') { const audioPayload = msg.media.payload; if (audioPayload) { const audioEvent = JSON.stringify({ type: "audio", audio_event: { audio_base_64: audioPayload, audio_format: { sample_rate: 8000, encoding: "ULAW", bits_per_sample: 8 } } }); ElevenWs.send(audioEvent); console.log('已向ElevenLabs发送来电者音频'); } } }); // 处理连接关闭 connection.socket.on('close', () => { console.log('Plivo WebSocket连接关闭'); ElevenWs.close(); }); ElevenWs.on('close', () => { console.log('ElevenLabs WebSocket连接关闭'); connection.socket.close(); }); }); // 启动服务 const start = async () => { try { await fastify.listen({ port: 3000 }); console.log(`服务运行在 ${fastify.server.address().port} 端口`); } catch (err) { fastify.log.error(err); process.exit(1); } }; start();
我目前的排查点:
- 确认了Plivo的
<Stream>标签配置了bidirectional="true",contentType是audio/x-mulaw;rate=8000,和ElevenLabs初始化时指定的输出格式一致 - 日志里能看到ElevenLabs返回了
audio事件,并且我已经把对应的base64音频通过playAudio事件发给了Plivo - 检查了Plivo的
playAudio事件格式,按照官方文档的要求构造的,但不确定有没有遗漏参数 - 不确定来电者的音频转发给ElevenLabs的格式是否正确,Plivo的inbound media payload是mulaw的base64,ElevenLabs是否需要额外处理?
有没有同学遇到过类似的问题,或者能帮我看看代码里哪里可能有问题?
内容来源于stack exchange




