Skip to content

Latest commit

 

History

History
121 lines (89 loc) · 4.95 KB

File metadata and controls

121 lines (89 loc) · 4.95 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

node-vad is a Voice Activity Detection (VAD) library for Node.js that wraps the WebRTC VAD algorithm from Chromium. It's a native addon that combines C/C++ code with JavaScript, using N-API (NAN) for Node.js bindings.

Architecture

Three-Layer Architecture

  1. C Layer (src/simplevad.c, src/simplevad.h)

    • Implements vadAllocate(), vadInit(), vadSetMode(), and vadProcessAudio()
    • Manages state, frame buffering (30ms frames), and sample rate conversion
    • Wraps the WebRTC VAD implementation from vendor/webrtc_vad/
    • Converts float samples to 16-bit PCM and aggregates results using an 80% threshold
  2. C++ Bindings Layer (src/vad_bindings.cc)

    • Uses NAN to create Node.js bindings: vad_alloc, vad_init, vad_setmode, vad_processAudio
    • Implements async worker (VADWorker) for non-blocking audio processing
    • Handles Buffer/ArrayBufferView compatibility across Node.js versions
  3. JavaScript Layer (lib/vad.js)

    • VAD class: Main API with processAudio() and processAudioFloat() methods
    • VADStream class: Transform stream that chunks audio, debounces speech events, and emits structured output
    • Converts 16-bit signed integers to normalized floats (±1.0 range)
    • index.js exports the VAD class

Key Data Flow

Audio Buffer → VAD.processAudio() → toFloatBuffer() → Native binding →
vadProcessAudio() → Frame chunking (30ms) → WebRTC VAD → Event aggregation → Promise result

For streams: Input chunks are buffered to 60-byte frames (based on sample rate), processed sequentially, and debounced speech state is emitted with timing metadata.

Build & Development

Building the Native Module

# Install dependencies and build native addon
npm install  # Runs node-gyp rebuild automatically

# Rebuild manually
node-gyp rebuild

# Clean build
node-gyp clean
node-gyp configure
node-gyp build

Testing Changes

# Run the non-stream example
cd examples
node process.js

# Run the stream example
cd examples
node stream.js

Both examples use examples/demo_pcm_s16_16000.raw as test audio (16-bit PCM, 16kHz).

Release Process

npm run release-patch  # Bumps patch version, publishes
npm run release-minor  # Bumps minor version, publishes
npm run release-major  # Bumps major version, publishes

Important Implementation Details

Sample Rate Requirements

  • Supported rates: 8000Hz, 16000Hz (recommended), 32000Hz, 48000Hz
  • Sample rate must remain constant after initialization (enforced in src/simplevad.c)
  • 16kHz is optimal for performance/accuracy tradeoff

Audio Format

  • Input: 16-bit signed PCM (via processAudio()) or 32-bit normalized float (via processAudioFloat())
  • Internal: Converted to float range [-1.0, 1.0] in lib/vad.js:43-51
  • Frame size: 30ms chunks (e.g., 480 samples at 16kHz)

VAD Modes

Define detection sensitivity (from lib/vad.js:173-178):

  • NORMAL: High bitrate, low-noise (may false-positive on noise)
  • LOW_BITRATE: Optimized for low-bitrate audio
  • AGGRESSIVE: Better for noisy environments
  • VERY_AGGRESSIVE: Lowest miss rate, works for most inputs

Stream Processing

The VADStream class (lib/vad.js:54-164):

  • Buffers incomplete frames and processes complete 60-byte chunks
  • Uses debounceTime (default 1000ms) to prevent rapid speech state toggling
  • Emits objects with time, audioData, and speech metadata (state, start, end, startTime, duration)

Event Aggregation

In src/simplevad.c:202-221, the vadDecision() function:

  • Processes multiple frames per call
  • Uses 80% majority vote: if 80%+ frames are VOICE, returns VOICE
  • Otherwise returns SILENCE (or ERROR if any frame errors)

Native Module Dependencies

  • NAN (Native Abstractions for Node.js): Provides version-agnostic N-API
  • bindings: Locates the compiled .node addon
  • node-gyp: Build system using binding.gyp
  • WebRTC VAD: Vendored in vendor/webrtc_vad/ with custom webrtc_vad.gyp

Common Pitfalls

  1. Sample rate changes: Once initialized, the VAD instance cannot change sample rates. Create a new instance if needed.
  2. Buffer sizes: Stream processing expects continuous data; very short buffers may not produce events until enough frames accumulate.
  3. Platform differences: The native module requires compilation. Pre-built binaries are not included, so node-gyp must run on install.
  4. Float conversion: The lib/vad.js:43-51 conversion doubles buffer size (int16 → float32). Be mindful of memory when processing large streams.