OmniVoice-Studio  by debpalash

Local cinematic AI dubbing and voice generation studio

Created 2 months ago
6,753 stars

Top 7.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

OmniVoice Studio offers a local, full-stack AI environment for cinematic audio dubbing, voice cloning, and generation. It targets engineers and power users, eliminating API keys and cloud reliance for self-contained, high-fidelity audio production on user hardware.

How It Works

Built on the OmniVoice 600-language zero-shot diffusion model, the studio performs advanced voice manipulation and dubbing tasks locally. Its key advantage is a cloud-agnostic architecture, enabling full pipeline execution on user machines. The system auto-detects and accelerates inference across Apple Silicon (MPS), NVIDIA (CUDA), AMD (ROCm), and CPUs for broad compatibility and optimized performance.

Quick Start & Requirements

The recommended installation uses Docker: git clone https://github.com/debpalash/OmniVoice-Studio.git, cd OmniVoice-Studio, and run docker compose up --build -d. Access the studio at http://localhost:8000. For local development, prerequisites include ffmpeg, bun, and uv. Setup involves uv sync, running the backend with uvicorn backend.main:app, and the frontend with bun install and bun run dev. Model weights (~1.2 GB) download automatically on first generation from HuggingFace.

Highlighted Details

  • Video Dubbing: Transcribe, translate, re-voice, and mux audio into MP4 with selective track export.
  • Vocal Isolation: Uses demucs to separate speech from music, preserving background audio.
  • Voice Cloning: Clone voices from 3-second clips and design custom profiles via tags.
  • Audio Control: Fine-grained volume/gain (0-200%) per dubbed segment.
  • Workflow: Keyboard shortcuts, live model telemetry (CPU/RAM/VRAM), and persistent local projects via SQLite.

Maintenance & Community

Contributions are welcomed via issues and pull requests. Specific maintainer details, sponsorships, or community channels (Discord/Slack) are not detailed in the README.

Licensing & Compatibility

The software license is not specified in the README. This omission significantly hinders assessment for commercial use or closed-source integration.

Limitations & Caveats

Native desktop applications are upcoming. Advanced "Real Speaker Diarization" is still under development, though ML-based diarization is supported. The undisclosed license is the primary adoption blocker.

Health Check
Last Commit

16 hours ago

Responsiveness

Inactive

Pull Requests (30d)
183
Issues (30d)
81
Star History
6,086 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.