OmniVoice-Studio by debpalash

Local cinematic AI dubbing and voice generation studio

Created 3 months ago

8,851 stars

Top 5.8% on SourcePulse

Project Summary

Summary

OmniVoice Studio offers a local, full-stack AI environment for cinematic audio dubbing, voice cloning, and generation. It targets engineers and power users, eliminating API keys and cloud reliance for self-contained, high-fidelity audio production on user hardware.

How It Works

Built on the OmniVoice 600-language zero-shot diffusion model, the studio performs advanced voice manipulation and dubbing tasks locally. Its key advantage is a cloud-agnostic architecture, enabling full pipeline execution on user machines. The system auto-detects and accelerates inference across Apple Silicon (MPS), NVIDIA (CUDA), AMD (ROCm), and CPUs for broad compatibility and optimized performance.

Quick Start & Requirements

The recommended installation uses Docker: git clone https://github.com/debpalash/OmniVoice-Studio.git, cd OmniVoice-Studio, and run docker compose up --build -d. Access the studio at http://localhost:8000. For local development, prerequisites include ffmpeg, bun, and uv. Setup involves uv sync, running the backend with uvicorn backend.main:app, and the frontend with bun install and bun run dev. Model weights (~1.2 GB) download automatically on first generation from HuggingFace.

Highlighted Details

Video Dubbing: Transcribe, translate, re-voice, and mux audio into MP4 with selective track export.
Vocal Isolation: Uses demucs to separate speech from music, preserving background audio.
Voice Cloning: Clone voices from 3-second clips and design custom profiles via tags.
Audio Control: Fine-grained volume/gain (0-200%) per dubbed segment.
Workflow: Keyboard shortcuts, live model telemetry (CPU/RAM/VRAM), and persistent local projects via SQLite.

Maintenance & Community

Contributions are welcomed via issues and pull requests. Specific maintainer details, sponsorships, or community channels (Discord/Slack) are not detailed in the README.

Licensing & Compatibility

The software license is not specified in the README. This omission significantly hinders assessment for commercial use or closed-source integration.

Limitations & Caveats

Native desktop applications are upcoming. Advanced "Real Speaker Diarization" is still under development, though ML-based diarization is supported. The undisclosed license is the primary adoption blocker.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

347

Issues (30d)

150

Star History

1,201 stars in the last 30 days