如何实现Node.js与浏览器间的语音通话（音频流、VoIP）

阿华AIGC实验室

2026-5-29

Browser ↔ Node.js Real-Time Voice Communication (tvoip Project)

Hey there, let's work through your questions since you've already got the Node.js-to-Node.js TCP PCM voice call up and running—great progress so far!

Key Questions & Answers

1. Does WebSocket support stream transmission?

Native WebSocket is message-oriented, not natively stream-based, but you can implement streaming by:

Splitting your PCM data into small chunks and sending them sequentially
Using the browser's Streams API to pipe audio data directly to/from a WebSocket connection (wrapping the WebSocket in a Readable/Writable stream)
That said, libraries like socket.io-stream abstract this complexity for you by handling chunking and stream piping under the hood.

2. Do I need to convert audio stream formats?

It depends on parameter consistency between the browser and Node.js:

Browser getUserMedia + AudioContext typically outputs PCM with parameters like 44.1kHz/48kHz sample rate, 16-bit depth, mono channel, and little-endian byte order.
If your Node.js server is processing PCM with exact same parameters, no conversion is needed.
If Node.js is using a different format (e.g., WAV with a file header, or different sample rate), you'll need to:
- Strip WAV headers on Node.js before sending to the browser
- Resample/convert bit depth using libraries like pcm-convert or ffmpeg if parameters don't match

3. Which protocol should I use?

For real-time voice communication, here are your top options:

WebRTC: The gold standard for browser-based real-time audio/video. It's optimized for low latency, handles NAT traversal automatically, and has built-in congestion control. You can use the wrtc npm package to add WebRTC support to your Node.js server, allowing direct peer-to-peer (or server-mediated) audio streams between browser and Node.
WebSocket + Streams API: If you want to stick closer to your existing TCP/socket.io setup, use native WebSocket with the browser's Streams API to pipe PCM chunks. This avoids the extra overhead of socket.io if you don't need its fallback features.
socket.io-stream: A valid option, but as you've noticed, it can hit bottlenecks for real-time audio due to socket.io's additional framing and fallback mechanisms.

4. Is socket.io-stream suitable for this kind of stream transmission?

Yes, but with caveats:

It works for streaming binary data (like PCM) by wrapping socket.io's message system into a stream interface.
The bottlenecks you're seeing are likely from:
- Socket.io's default fallback to long-polling (force WebSocket only with transports: ['websocket'] in client/server config)
- Unnecessary buffering in the stream pipeline (try adjusting highWaterMark values when creating streams)
- Overhead from socket.io's message framing compared to raw WebSocket
  If low latency is critical, WebRTC DataChannel or raw WebSocket + Streams will perform better.

5. Is browser PCM compatible with Node.js?

Absolutely—as long as the PCM parameters match. Double-check these settings on both ends:

Sample rate (44.1kHz vs 48kHz)
Bit depth (16-bit is standard for both)
Channel count (mono vs stereo)
Byte order (little-endian is typical for both browser and Node.js)
If these align, you can pipe the browser's PCM data directly to Node.js and vice versa without conversion.

Tips to Fix socket.io-stream Bottlenecks

If you want to stick with socket.io-stream for now, try these tweaks:

Force WebSocket transport only:

// Client side
const io = require('socket.io-client');
const socket = io('http://your-server', { transports: ['websocket'] });

// Server side
const io = require('socket.io')(server, { transports: ['websocket'] });

Reduce stream buffer sizes:

const ss = require('socket.io-stream');
const stream = ss.createStream({ highWaterMark: 1024 }); // Smaller buffer = lower latency

Avoid unnecessary data processing: Make sure you're not encoding/decoding the PCM data more than needed (e.g., don't convert to base64—send raw binary).

Better Alternative: WebRTC

For real-time voice, WebRTC is a better fit. Here's a quick outline:

On Node.js, use the wrtc package to create a RTCPeerConnection
Exchange SDP offers/answers and ICE candidates between browser and Node.js (you can use a simple WebSocket signaling server for this)
Use RTCDataChannel to send raw PCM chunks, or RTCAudioTrack for built-in audio handling

内容的提问来源于stack exchange，提问作者Forivin