🎙️ Real-Time English-to-Sinhala Dubbing System

A Final Year Research and Development Project
Department of Computer Science & Engineering, University of Moratuwa

📌 Overview

This project aims to build an AI-powered real-time English-to-Sinhala dubbing system that preserves speaker identity and emotional tone. It enables users to input English videos (or real-time streams), and receive back synchronized Sinhala dubbed outputs — ideal for accessibility, education, and low-resource language preservation.

The system supports both offline (Phase 1) and real-time (Phase 2) dubbing pipelines.

🚀 Features

🎧 Automatic Speech Recognition (ASR) using [Faster-Whisper]
🌍 Neural Machine Translation (NMT) using [Meta NLLB-200] with CTranslate2 for fast inference
🗣️ Text-to-Speech (TTS) synthesis using dual approach:
- Fine-tuned [XTTS_v2] for Sinhala
- [GPT-4o-mini-TTS] + [Seed-VC] for real-time speaker cloning
🔈 Voice preservation
🧠 Real-time streaming pipeline with sentence-aligned buffering
🕰️ Audio-video synchronization using time-stretching

🧪 System Architecture

The system follows a modular pipeline that processes English input (audio or video) and produces Sinhala dubbed output. It works in both offline and real-time modes.

🔄 Processing Flow

🎥 Input (English Audio/Video)
The system accepts English video or audio files. In real-time mode, audio is processed in streamed chunks.
🧠 ASR – Automatic Speech Recognition
Uses Faster-Whisper to transcribe English speech into text.
In real-time, Voice Activity Detection (VAD) segments the audio into manageable units.
🌍 NMT – Neural Machine Translation
Transcribes are translated into Sinhala using Meta NLLB-200 (distilled 1.3B) with CTranslate2 for low-latency execution.
🗣️ TTS – Text-to-Speech Synthesis
Sinhala text is converted to Sinhala speech using one of two options:
- XTTS_v2 (Fine-tuned): Default choice for high-quality, low-latency synthesis.
- GPT-4o-mini-TTS + Seed-VC: Generates base voice with GPT-4o-mini-TTS, then applies speaker voice cloning with Seed-VC.
⏱️ Synchronization & Post-Processing
Synthesized Sinhala audio is time-stretched using Librosa or Rubberband to align with the original English video timing.
📤 Output (Dubbed Sinhala Audio/Video)
The final Sinhala audio replaces the English audio track in the video.
- In offline mode: Full video is re-rendered.
- In real-time mode: Output is played with ~2s latency buffering.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Real-Time English-to-Sinhala Dubbing System

📌 Overview

🚀 Features

🧪 System Architecture

🔄 Processing Flow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🎙️ Real-Time English-to-Sinhala Dubbing System

📌 Overview

🚀 Features

🧪 System Architecture

🔄 Processing Flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages