This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
node-vad is a Voice Activity Detection (VAD) library for Node.js that wraps the WebRTC VAD algorithm from Chromium. It's a native addon that combines C/C++ code with JavaScript, using N-API (NAN) for Node.js bindings.
-
C Layer (src/simplevad.c, src/simplevad.h)
- Implements
vadAllocate(),vadInit(),vadSetMode(), andvadProcessAudio() - Manages state, frame buffering (30ms frames), and sample rate conversion
- Wraps the WebRTC VAD implementation from vendor/webrtc_vad/
- Converts float samples to 16-bit PCM and aggregates results using an 80% threshold
- Implements
-
C++ Bindings Layer (src/vad_bindings.cc)
- Uses NAN to create Node.js bindings:
vad_alloc,vad_init,vad_setmode,vad_processAudio - Implements async worker (
VADWorker) for non-blocking audio processing - Handles Buffer/ArrayBufferView compatibility across Node.js versions
- Uses NAN to create Node.js bindings:
-
JavaScript Layer (lib/vad.js)
VADclass: Main API withprocessAudio()andprocessAudioFloat()methodsVADStreamclass: Transform stream that chunks audio, debounces speech events, and emits structured output- Converts 16-bit signed integers to normalized floats (±1.0 range)
- index.js exports the VAD class
Audio Buffer → VAD.processAudio() → toFloatBuffer() → Native binding →
vadProcessAudio() → Frame chunking (30ms) → WebRTC VAD → Event aggregation → Promise result
For streams: Input chunks are buffered to 60-byte frames (based on sample rate), processed sequentially, and debounced speech state is emitted with timing metadata.
# Install dependencies and build native addon
npm install # Runs node-gyp rebuild automatically
# Rebuild manually
node-gyp rebuild
# Clean build
node-gyp clean
node-gyp configure
node-gyp build# Run the non-stream example
cd examples
node process.js
# Run the stream example
cd examples
node stream.jsBoth examples use examples/demo_pcm_s16_16000.raw as test audio (16-bit PCM, 16kHz).
npm run release-patch # Bumps patch version, publishes
npm run release-minor # Bumps minor version, publishes
npm run release-major # Bumps major version, publishes- Supported rates: 8000Hz, 16000Hz (recommended), 32000Hz, 48000Hz
- Sample rate must remain constant after initialization (enforced in src/simplevad.c)
- 16kHz is optimal for performance/accuracy tradeoff
- Input: 16-bit signed PCM (via
processAudio()) or 32-bit normalized float (viaprocessAudioFloat()) - Internal: Converted to float range [-1.0, 1.0] in lib/vad.js:43-51
- Frame size: 30ms chunks (e.g., 480 samples at 16kHz)
Define detection sensitivity (from lib/vad.js:173-178):
NORMAL: High bitrate, low-noise (may false-positive on noise)LOW_BITRATE: Optimized for low-bitrate audioAGGRESSIVE: Better for noisy environmentsVERY_AGGRESSIVE: Lowest miss rate, works for most inputs
The VADStream class (lib/vad.js:54-164):
- Buffers incomplete frames and processes complete 60-byte chunks
- Uses
debounceTime(default 1000ms) to prevent rapid speech state toggling - Emits objects with
time,audioData, andspeechmetadata (state, start, end, startTime, duration)
In src/simplevad.c:202-221, the vadDecision() function:
- Processes multiple frames per call
- Uses 80% majority vote: if 80%+ frames are VOICE, returns VOICE
- Otherwise returns SILENCE (or ERROR if any frame errors)
- NAN (Native Abstractions for Node.js): Provides version-agnostic N-API
- bindings: Locates the compiled
.nodeaddon - node-gyp: Build system using binding.gyp
- WebRTC VAD: Vendored in vendor/webrtc_vad/ with custom webrtc_vad.gyp
- Sample rate changes: Once initialized, the VAD instance cannot change sample rates. Create a new instance if needed.
- Buffer sizes: Stream processing expects continuous data; very short buffers may not produce events until enough frames accumulate.
- Platform differences: The native module requires compilation. Pre-built binaries are not included, so
node-gypmust run on install. - Float conversion: The lib/vad.js:43-51 conversion doubles buffer size (int16 → float32). Be mindful of memory when processing large streams.