Skip to content

[Example] 530 — Silero VAD Speech Segmentation with Deepgram STT (Python)#208

Open
github-actions[bot] wants to merge 1 commit intomainfrom
example/530-silero-vad-speech-segmentation-python
Open

[Example] 530 — Silero VAD Speech Segmentation with Deepgram STT (Python)#208
github-actions[bot] wants to merge 1 commit intomainfrom
example/530-silero-vad-speech-segmentation-python

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot commented Apr 8, 2026

New example: Silero VAD Speech Segmentation with Deepgram STT

Integration: Silero VAD | Language: Python | Products: STT

What this shows

Demonstrates how to use Silero VAD (Voice Activity Detection) to detect speech regions in an audio file, extract each segment, and transcribe them individually with Deepgram. This covers a common pre-processing pipeline: detect speech boundaries locally with Silero VAD, slice the waveform, and send each speech chunk to Deepgram's nova-3 model for transcription.

Required secrets

None — only DEEPGRAM_API_KEY required

Tests

✅ Tests passed

── detect_speech_regions ──
  Found 4 speech region(s)
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Results: 4 passed, 0 failed

Built by Engineer on 2026-04-08

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 8, 2026

Code Review

Overall: APPROVED

Tests ran ✅

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s
    2.79s - 4.32s
    4.51s - 12.54s
    12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
Detected 4 speech region(s). Transcribing...
  Transcribed 4 segment(s), 331 total chars
    [0.1s-2.7s] conf=1.00 'Yeah. As as much as, it's worth'
    [2.8s-4.3s] conf=0.53 'Celebrating'
    [4.5s-12.5s] conf=1.00 'The first, spacewalk, with an all female team...'
    [12.7s-25.4s] conf=1.00 'And, I think if it signifies anything, it is, to honor the t...'
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Results: 4 passed, 0 failed

Integration genuineness

Pass — Silero VAD is a local audio processing library (no cloud API). The example correctly:

  1. Imports and uses silero_vad (load_silero_vad, get_speech_timestamps, read_audio)
  2. Makes real VAD calls on actual audio — not mocked or hardcoded
  3. .env.example appropriately lists only DEEPGRAM_API_KEY (Silero VAD has no credentials)
  4. Tests exit code 2 on missing credentials, real Deepgram API calls in e2e test
  5. Audio flows through Silero VAD first (segmentation), then each segment goes to Deepgram — correct pipeline pattern
  6. No raw WebSocket/fetch calls — uses official deepgram-sdk

Code quality

  • ✅ Official Deepgram SDK (deepgram-sdk==6.1.1) — correct version
  • tag="deepgram-examples" present on all Deepgram API calls
  • ✅ No hardcoded credentials
  • ✅ Error handling: credential check in main(), file existence check, proper exit codes
  • ✅ Tests import from src/ and call the example's actual functions (detect_speech_regions, extract_segment_bytes, process_audio)
  • ✅ Transcript assertions use length/duration proportionality (min_chars = max(5, audio_duration_sec * 2)) — no word lists
  • ✅ Credential check runs before SDK imports in tests (lines 13-22 before from segmenter import ...)

Documentation

  • ✅ README: "What you'll build" section, env vars with links, install/run instructions, CLI options, key parameters, how it works
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-08

@github-actions github-actions bot added the status:review-passed Self-review passed label Apr 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 9, 2026

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_detect_speech_regions PASSED
tests/test_example.py::test_extract_segment_bytes PASSED
tests/test_example.py::test_process_audio_end_to_end PASSED
tests/test_example.py::test_vad_parameters_affect_output PASSED

4 passed, 12 warnings in 5.91s

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s
    2.79s - 4.32s
    4.51s - 12.54s
    12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
    [0.1s-2.7s] conf=1.00
    [2.8s-4.3s] conf=0.53
    [4.5s-12.5s] conf=1.00
    [12.7s-25.4s] conf=1.00
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Results: 4 passed, 0 failed

Integration genuineness

Pass.

  1. ✅ Silero VAD SDK imported and used (from silero_vad import get_speech_timestamps, load_silero_vad, read_audio)
  2. ✅ Real VAD calls made — get_speech_timestamps() runs locally on audio waveform
  3. .env.example lists DEEPGRAM_API_KEY (Silero VAD runs locally — no API key needed for it)
  4. ✅ Tests exit with code 2 on missing credentials
  5. ✅ BYPASS CHECK: Silero VAD is a local pre-processing library (not a Deepgram audio interface), so DeepgramClient direct usage is correct — audio flows through Silero VAD segmentation first, then each segment is transcribed via Deepgram
  6. ✅ NO RAW PROTOCOL CHECK: Uses client.listen.v1.media.transcribe_file() — no raw WebSocket/fetch

Code quality

  • ✅ Official Deepgram SDK (deepgram-sdk==6.1.1 — matches required v6.1.1)
  • tag="deepgram-examples" present on Deepgram API call
  • ✅ No hardcoded credentials
  • ✅ Error handling for missing credentials and file not found
  • ✅ Tests import from src/ and call the example's actual functions
  • ✅ Transcript assertions use length/duration proportionality (min_chars = max(5, audio_duration_sec * 2)) — no word lists
  • ✅ Credential check runs first (top of test file, before SDK imports)
  • ⚠️ Minor: import torchaudio in segmenter.py line 24 is unused (silero-vad uses it internally, but the import in this file is dead code). Please remove it.

Documentation

  • ✅ README has "What you'll build" section
  • ✅ Environment variables table with console link
  • ✅ Install and run instructions
  • ✅ CLI options and key parameters documented
  • .env.example present and complete

✓ All checks pass. Ready for merge.

One minor nit: remove the unused import torchaudio from src/segmenter.py.


Review by Lead on 2026-04-09

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 9, 2026

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_detect_speech_regions PASSED
tests/test_example.py::test_extract_segment_bytes PASSED
tests/test_example.py::test_process_audio_end_to_end PASSED
tests/test_example.py::test_vad_parameters_affect_output PASSED

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s
    2.79s - 4.32s
    4.51s - 12.54s
    12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
    [0.1s-2.7s] conf=1.00
    [2.8s-4.3s] conf=0.53
    [4.5s-12.5s] conf=1.00
    [12.7s-25.4s] conf=1.00
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Results: 4 passed, 0 failed

Integration genuineness

Pass — All 6 checks passed:

  1. Silero VAD SDK imported and used (get_speech_timestamps, load_silero_vad, read_audio)
  2. Real VAD call made on actual audio — not mocked or hardcoded
  3. .env.example lists DEEPGRAM_API_KEY (Silero VAD is local, no API key needed — legitimate)
  4. Tests exit with code 2 if credentials missing
  5. Bypass check passed — Silero VAD is a local speech detection model (not a Deepgram wrapper), so DeepgramClient usage for transcription is correct
  6. No raw protocol — all Deepgram API contact uses the official SDK

Code quality

  • ✅ Official Deepgram SDK (deepgram-sdk==6.1.1) — correct pinned version
  • tag="deepgram-examples" present on Deepgram API call
  • ✅ No hardcoded credentials
  • ✅ Error handling covers missing API key, missing file, empty speech detection
  • ✅ Tests import from src/ and call the example's actual functions
  • ✅ Transcript assertions use length/duration proportionality — no specific word lists
  • ✅ Credential check runs first (before SDK imports that could throw)

Documentation

  • ✅ README includes "What you'll build", env vars table with console link, install/run instructions, CLI options, parameter table, and architecture explanation
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-09

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 9, 2026

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_detect_speech_regions PASSED                 [ 25%]
tests/test_example.py::test_extract_segment_bytes PASSED                 [ 50%]
tests/test_example.py::test_process_audio_end_to_end PASSED              [ 75%]
tests/test_example.py::test_vad_parameters_affect_output PASSED          [100%]

4 passed, 12 warnings in 6.87s

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s / 2.79s - 4.32s / 4.51s - 12.54s / 12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Integration genuineness

Pass — All 6 checks passed:

  1. Silero VAD SDK imported (from silero_vad import get_speech_timestamps, load_silero_vad, read_audio)
  2. Real VAD calls: get_speech_timestamps() runs on actual audio waveform
  3. .env.example lists DEEPGRAM_API_KEY — correct since Silero VAD is a local library (no API key needed)
  4. Tests exit 2 on missing credentials (test_example.py:20-22)
  5. Bypass check: Pass — Silero VAD is a local preprocessing library (not an audio/speech interface wrapping Deepgram), so DeepgramClient usage for transcription is the correct pattern
  6. No raw protocol: Pass — all Deepgram calls use the official SDK via client.listen.v1.media.transcribe_file()

Code quality

  • ✅ Official Deepgram SDK: deepgram-sdk==6.1.1 (matches required version)
  • tag="deepgram-examples" present on Deepgram API call (segmenter.py:106)
  • ✅ No hardcoded credentials
  • ✅ Error handling: credential check + file existence check in main()
  • ✅ Tests import from src/ and call the example's actual functions (detect_speech_regions, extract_segment_bytes, process_audio)
  • ✅ Transcript assertions use length/duration proportionality (min_chars based on audio_duration_sec * 2) — no specific word lists
  • ✅ Credential check runs first in test (module-level, before any SDK imports that could throw)
  • ⚠️ Minor: import torchaudio (segmenter.py:24) is unused — silero_vad.read_audio handles audio loading. Harmless since torchaudio is a silero_vad dependency anyway, but could be removed for cleanliness.

Documentation

  • ✅ README has "What you'll build" section
  • ✅ Environment variables table with link to Deepgram console
  • ✅ Install and run instructions with CLI options
  • ✅ Key parameters table
  • ✅ "How it works" explanation
  • .env.example present and complete

✓ All checks pass. Ready for merge.

Note: Attempted to push a minor cleanup (remove unused torchaudio import) but lacked push permissions. This is non-blocking.


Review by Lead on 2026-04-09

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 9, 2026

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_detect_speech_regions PASSED
tests/test_example.py::test_extract_segment_bytes PASSED
tests/test_example.py::test_process_audio_end_to_end PASSED
tests/test_example.py::test_vad_parameters_affect_output PASSED

4 passed, 0 failed

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s
    2.79s - 4.32s
    4.51s - 12.54s
    12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
    [0.1s-2.7s] conf=1.00 'Yeah. As as much as, it's worth'
    [2.8s-4.3s] conf=0.53 'Celebrating'
    [4.5s-12.5s] conf=1.00 'The first, spacewalk, with an all female team...'
    [12.7s-25.4s] conf=1.00 'And, I think if it signifies anything, it is...'
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Integration genuineness

Pass — All 6 checks satisfied:

  1. Silero VAD SDK imported and used (get_speech_timestamps, load_silero_vad, read_audio)
  2. Real VAD inference runs on actual audio — not mocked or hardcoded
  3. .env.example lists DEEPGRAM_API_KEY (Silero VAD is a local library — no API key needed)
  4. Tests exit(2) on missing credentials before any SDK imports
  5. Bypass check OK: Silero VAD is a local processing library with no Deepgram audio interface — DeepgramClient usage is appropriate for sending VAD-segmented chunks to Deepgram
  6. No raw protocol: All Deepgram contact goes through deepgram-sdk

Code quality

  • deepgram-sdk==6.1.1 — matches required Python SDK version
  • tag="deepgram-examples" present on every Deepgram API call
  • ✅ No hardcoded credentials; credential check in main() and in tests
  • ✅ Error handling covers missing key and missing file
  • ✅ Tests import from src/ and call the example's actual functions (detect_speech_regions, extract_segment_bytes, process_audio)
  • ✅ Transcript assertions use length/duration proportionality (audio_duration_sec * 2 min chars), not specific word lists
  • ✅ Credential check runs FIRST in tests (lines 13-22) before any SDK imports
  • ℹ️ Minor: import torchaudio (line 24) and soundfile in requirements.txt are not directly used in segmenter.py — they're transitive dependencies for silero-vad. Not a blocker.

Documentation

  • ✅ README has "What you'll build", env vars table with console links, install/run instructions, CLI options, and "How it works" explanation
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-09

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 9, 2026

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_detect_speech_regions PASSED                 [ 25%]
tests/test_example.py::test_extract_segment_bytes PASSED                 [ 50%]
tests/test_example.py::test_process_audio_end_to_end PASSED              [ 75%]
tests/test_example.py::test_vad_parameters_affect_output PASSED          [100%]

4 passed, 12 warnings in 7.99s

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s
    2.79s - 4.32s
    4.51s - 12.54s
    12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Integration genuineness

Pass — All 6 checks satisfied:

  1. Silero VAD SDK imported and used (silero_vad.get_speech_timestamps, load_silero_vad, read_audio)
  2. Real VAD inference runs locally on audio waveform — not mocked
  3. .env.example lists DEEPGRAM_API_KEY; Silero VAD is a local model requiring no API key — correct
  4. Tests exit(2) on missing credentials before any SDK imports
  5. Bypass check pass: Silero VAD is a local pre-processing library with no Deepgram interface — audio correctly flows through Silero for segmentation, then each segment is sent to Deepgram via the SDK
  6. No raw protocol: All Deepgram API contact uses DeepgramClient; no raw WebSocket/fetch calls

Code quality

  • ✅ Official Deepgram SDK (deepgram-sdk==6.1.1) — correct version
  • tag="deepgram-examples" present on all Deepgram API calls (segmenter.py:106)
  • ✅ No hardcoded credentials
  • ✅ Error handling: credential check and file existence check in main()
  • ✅ Tests import from src/ and call actual example functions (detect_speech_regions, extract_segment_bytes, process_audio)
  • ✅ Transcript assertions use length/duration proportionality (audio_duration_sec * 2 min chars) — no specific word lists
  • ✅ Credential check runs first in tests (lines 13–22) before any src imports

Documentation

  • ✅ README includes "What you'll build", env vars table with console links, install/run instructions, CLI options, and how-it-works section
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-09

@github-actions
Copy link
Copy Markdown
Contributor Author

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_detect_speech_regions PASSED
tests/test_example.py::test_extract_segment_bytes PASSED
tests/test_example.py::test_process_audio_end_to_end PASSED
tests/test_example.py::test_vad_parameters_affect_output PASSED

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s / 2.79s - 4.32s / 4.51s - 12.54s / 12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

4 passed, 0 failed

Integration genuineness

Pass — All 6 checks satisfied:

  1. Silero VAD SDK imported and used (load_silero_vad, get_speech_timestamps, read_audio)
  2. Real VAD inference runs on actual audio — not mocked or hardcoded
  3. .env.example complete (Silero VAD is local-only, no extra credentials needed)
  4. Tests exit 2 on missing credentials, make real Deepgram API calls
  5. Bypass check: Silero VAD is a pre-processing library with no Deepgram interface — DeepgramClient is correctly used directly for transcription
  6. No raw WebSocket/fetch calls — official deepgram-sdk used throughout

Code quality

  • deepgram-sdk==6.1.1 — correct pinned version
  • tag="deepgram-examples" present on all Deepgram API calls
  • ✅ No hardcoded credentials
  • ✅ Error handling: credential check + file existence check in main()
  • ✅ Tests import from src/ and call the example's actual functions
  • ✅ Transcript assertions use length/duration proportionality — no word-list flakiness
  • ✅ Credential check runs first (module-level in test file, before any SDK imports)
  • ⚠️ Minor: import torchaudio in segmenter.py is unused (torchaudio is used internally by silero_vad, but the explicit import is unnecessary). Non-blocking.

Documentation

  • ✅ README includes: what you'll build, env vars with console links, install/run instructions, CLI options, and how-it-works explanation
  • .env.example present and complete

✓ All checks pass. Ready for merge.

Minor nit: Remove the unused import torchaudio from src/segmenter.py (line 24). Could not push fix due to permissions.


Review by Lead on 2026-04-10

@github-actions
Copy link
Copy Markdown
Contributor Author

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_detect_speech_regions PASSED                 [ 25%]
tests/test_example.py::test_extract_segment_bytes PASSED                 [ 50%]
tests/test_example.py::test_process_audio_end_to_end PASSED              [ 75%]
tests/test_example.py::test_vad_parameters_affect_output PASSED          [100%]

======================== 4 passed, 12 warnings in 7.80s ========================

Detailed output:
── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s / 2.79s - 4.32s / 4.51s - 12.54s / 12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Integration genuineness

Pass — Silero VAD SDK is imported and used for real speech detection (get_speech_timestamps, load_silero_vad, read_audio). Audio flows through Silero VAD for segmentation, then each extracted segment is sent to Deepgram via the official SDK. Silero VAD is a local model (no API key required), so .env.example correctly only lists DEEPGRAM_API_KEY. No bypass — no raw WebSocket/fetch calls.

Code quality

  • ✅ Official Deepgram SDK deepgram-sdk==6.1.1 (current required version)
  • tag="deepgram-examples" present on Deepgram API calls
  • ✅ No hardcoded credentials
  • ✅ Error handling: credential check + file existence check in main()
  • ✅ Tests import from src/ and call actual example functions
  • ✅ Transcript assertions use length/duration proportionality (not word lists)
  • ✅ Credential check runs first (exit 2 on missing creds) before SDK imports
  • ⚠️ Minor: import torchaudio (line 24) is unused directly — it's a transitive dependency of silero_vad. Non-blocking.

Documentation

  • ✅ README has "What you'll build", env vars table with console links, install/run instructions, CLI options, key parameters, and "How it works" explanation
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-10

@github-actions
Copy link
Copy Markdown
Contributor Author

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_detect_speech_regions PASSED
tests/test_example.py::test_extract_segment_bytes PASSED
tests/test_example.py::test_process_audio_end_to_end PASSED
tests/test_example.py::test_vad_parameters_affect_output PASSED

4 passed, 0 failed

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s | 2.79s - 4.32s | 4.51s - 12.54s | 12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Integration genuineness

Pass. Silero VAD SDK is imported and used for real speech detection (load_silero_vad, get_speech_timestamps, read_audio). VAD runs locally on real audio, segments are extracted, then each is sent to Deepgram for transcription. No bypass — audio flows through Silero VAD's processing pipeline before reaching Deepgram. No raw WebSocket/fetch calls. Credential check exits with code 2 when missing.

Code quality

  • ✅ Official Deepgram SDK (deepgram-sdk==6.1.1) — correct version
  • tag="deepgram-examples" present on all Deepgram API calls
  • ✅ No hardcoded credentials
  • ✅ Error handling: credential check + file existence check in main()
  • ✅ Tests import from src/ and call the example's actual functions (detect_speech_regions, extract_segment_bytes, process_audio)
  • ✅ Transcript assertions use length/duration proportionality (min_chars = max(5, audio_duration_sec * 2)) — no word lists
  • ✅ Credential check runs before SDK imports in tests (module-level check at top of test file with sys.exit(2))
  • ⚠️ Minor: import torchaudio (segmenter.py:24) is unused — it's a transitive dependency of silero_vad but not directly referenced. Safe to remove.

Documentation

  • ✅ README includes "What you'll build", env vars table with console link, install/run instructions, CLI options, and "How it works" explanation
  • .env.example present and complete

✓ All checks pass. Ready for merge.

Note: Attempted to push a fix for the unused torchaudio import but write access was denied. This is a cosmetic issue only and does not block approval.


Review by Lead on 2026-04-10

@github-actions
Copy link
Copy Markdown
Contributor Author

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_detect_speech_regions PASSED
tests/test_example.py::test_extract_segment_bytes PASSED
tests/test_example.py::test_process_audio_end_to_end PASSED
tests/test_example.py::test_vad_parameters_affect_output PASSED

4 passed, 0 failed

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s
    2.79s - 4.32s
    4.51s - 12.54s
    12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
    [0.1s-2.7s] conf=1.00
    [2.8s-4.3s] conf=0.53
    [4.5s-12.5s] conf=1.00
    [12.7s-25.4s] conf=1.00
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Integration genuineness ✅

  • Silero VAD SDK imported and used (get_speech_timestamps, load_silero_vad, read_audio)
  • Real VAD inference runs locally on actual audio — not mocked or hardcoded
  • Deepgram SDK used via client.listen.v1.media.transcribe_file() — no raw HTTP/WebSocket
  • Architecture is correct: Silero VAD is a local pre-processing library (no cloud API), so DeepgramClient is used directly to transcribe VAD-extracted segments
  • No bypass — audio flows through Silero VAD segmentation before reaching Deepgram

Code quality ✅

  • deepgram-sdk==6.1.1 — matches required version
  • tag="deepgram-examples" present on Deepgram API call
  • No hardcoded credentials
  • Error handling: missing API key check and file existence check in main()
  • Tests import from src/ and test the example's actual exported functions
  • Transcript assertions use length/duration proportionality (audio_duration_sec * 2 min chars) — no word-list assertions
  • Credential check exits with code 2 before any SDK calls in tests

Documentation ✅

  • README includes what you'll build, env vars with console links, install/run instructions, CLI options, and architecture overview
  • .env.example present with DEEPGRAM_API_KEY

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-10

@github-actions
Copy link
Copy Markdown
Contributor Author

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_detect_speech_regions PASSED
tests/test_example.py::test_extract_segment_bytes PASSED
tests/test_example.py::test_process_audio_end_to_end PASSED
tests/test_example.py::test_vad_parameters_affect_output PASSED

4 passed, 0 failed (5.69s)

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s / 2.79s - 4.32s / 4.51s - 12.54s / 12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Transcribed 4 segment(s), 331 total chars
    [0.1s-2.7s] conf=1.00 'Yeah. As as much as, it's worth'
    [2.8s-4.3s] conf=0.53 'Celebrating'
    [4.5s-12.5s] conf=1.00 'The first, spacewalk, with an all female team...'
    [12.7s-25.4s] conf=1.00 'And, I think if it signifies anything, it is...'
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Integration genuineness

Pass — All 6 checks pass:

  1. Silero VAD SDK imported and used (load_silero_vad, get_speech_timestamps, read_audio)
  2. Real VAD calls made on actual audio — not mocked or hardcoded
  3. .env.example lists DEEPGRAM_API_KEY (Silero VAD is a local library, no API key needed)
  4. Tests exit with code 2 if credentials missing
  5. Bypass check: N/A — Silero VAD is a local pre-processing library, not a partner wrapping Deepgram. Direct DeepgramClient use is correct for the transcription step.
  6. No raw protocol: All Deepgram API contact uses the official SDK

Code quality

  • deepgram-sdk==6.1.1 — matches required version
  • tag="deepgram-examples" present on Deepgram API call (segmenter.py:106)
  • ✅ No hardcoded credentials
  • ✅ Error handling: credential check + file existence check in main()
  • ✅ Tests import from src/ and call the example's actual functions
  • ✅ Transcript assertions use length/duration proportionality (min_chars = max(5, audio_duration_sec * 2)) — no flaky word lists
  • ✅ Credential check runs before SDK operations
  • ℹ️ Minor: torchaudio imported but unused in segmenter.py:24 (silero_vad handles loading via read_audio). Non-blocking.

Documentation

  • ✅ README has "What you'll build", env vars table with console links, install/run instructions, CLI options, and "How it works" breakdown
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-10

@github-actions
Copy link
Copy Markdown
Contributor Author

Code Review

Overall: APPROVED

Tests ran ✅

── detect_speech_regions ──
  Found 4 speech region(s)
    0.07s - 2.69s
    2.79s - 4.32s
    4.51s - 12.54s
    12.71s - 25.41s
  ✓ detect_speech_regions finds speech in real audio

── extract_segment_bytes ──
  ✓ extract_segment_bytes produces valid WAV

── process_audio (end-to-end) ──
  Detected 4 speech region(s). Transcribing...
  Transcribed 4 segment(s), 331 total chars
  ✓ End-to-end pipeline transcribes real speech correctly

── VAD parameters ──
  threshold=0.5 → 4 regions, threshold=0.9 → 8 regions
  ✓ VAD parameters affect segmentation output

Results: 4 passed, 0 failed

Integration genuineness

Pass — Silero VAD SDK is imported and used for real speech detection (get_speech_timestamps, load_silero_vad, read_audio). Audio flows through Silero VAD for segmentation, then each detected speech region is transcribed via the Deepgram SDK. Silero VAD is a local processing library (no credentials needed), so .env.example correctly lists only DEEPGRAM_API_KEY. No bypass — Deepgram is used for STT, Silero for VAD, each in its proper role.

Code quality

  • Official Deepgram SDK deepgram-sdk==6.1.1 (correct pinned version)
  • tag="deepgram-examples" present on all Deepgram API calls
  • No hardcoded credentials
  • Error handling covers missing API key and missing audio file
  • Tests import from src/ and call the example's actual code (detect_speech_regions, extract_segment_bytes, process_audio)
  • Credential check runs first (exit 2) before SDK imports
  • Transcript assertions use length/duration proportionality, not word lists

Documentation

  • README includes "What you'll build", env vars table with console link, install/run instructions, CLI options, and how-it-works section
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:silero-vad Integration: Silero VAD language:python Language: Python status:review-passed Self-review passed type:example New example

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants