Skip to content

aaivu/DubGenix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 

Repository files navigation

πŸŽ™οΈ Real-Time English-to-Sinhala Dubbing System

A Final Year Research and Development Project
Department of Computer Science & Engineering, University of Moratuwa

πŸ“Œ Overview

This project aims to build an AI-powered real-time English-to-Sinhala dubbing system that preserves speaker identity and emotional tone. It enables users to input English videos (or real-time streams), and receive back synchronized Sinhala dubbed outputs β€” ideal for accessibility, education, and low-resource language preservation.

The system supports both offline (Phase 1) and real-time (Phase 2) dubbing pipelines.


πŸš€ Features

  • 🎧 Automatic Speech Recognition (ASR) using [Faster-Whisper]
  • 🌍 Neural Machine Translation (NMT) using [Meta NLLB-200] with CTranslate2 for fast inference
  • πŸ—£οΈ Text-to-Speech (TTS) synthesis using dual approach:
    • Fine-tuned [XTTS_v2] for Sinhala
    • [GPT-4o-mini-TTS] + [Seed-VC] for real-time speaker cloning
  • πŸ”ˆ Voice preservation
  • 🧠 Real-time streaming pipeline with sentence-aligned buffering
  • πŸ•°οΈ Audio-video synchronization using time-stretching

πŸ§ͺ System Architecture

The system follows a modular pipeline that processes English input (audio or video) and produces Sinhala dubbed output. It works in both offline and real-time modes.

πŸ”„ Processing Flow

  1. πŸŽ₯ Input (English Audio/Video)
    The system accepts English video or audio files. In real-time mode, audio is processed in streamed chunks.

  2. 🧠 ASR – Automatic Speech Recognition
    Uses Faster-Whisper to transcribe English speech into text.
    In real-time, Voice Activity Detection (VAD) segments the audio into manageable units.

  3. 🌍 NMT – Neural Machine Translation
    Transcribes are translated into Sinhala using Meta NLLB-200 (distilled 1.3B) with CTranslate2 for low-latency execution.

  4. πŸ—£οΈ TTS – Text-to-Speech Synthesis
    Sinhala text is converted to Sinhala speech using one of two options:

    • XTTS_v2 (Fine-tuned): Default choice for high-quality, low-latency synthesis.
    • GPT-4o-mini-TTS + Seed-VC: Generates base voice with GPT-4o-mini-TTS, then applies speaker voice cloning with Seed-VC.
  5. ⏱️ Synchronization & Post-Processing
    Synthesized Sinhala audio is time-stretched using Librosa or Rubberband to align with the original English video timing.

  6. πŸ“€ Output (Dubbed Sinhala Audio/Video)
    The final Sinhala audio replaces the English audio track in the video.

    • In offline mode: Full video is re-rendered.
    • In real-time mode: Output is played with ~2s latency buffering.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors