DramaBox  by resemble-ai

AI-powered expressive TTS with voice cloning

Created 1 month ago
397 stars

Top 72.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

DramaBox offers a highly expressive Text-to-Speech (TTS) system capable of voice cloning, built upon Lightricks' LTX-2.3 audio model. It targets developers and researchers seeking fine-grained control over synthesized speech, enabling nuanced emotional delivery, speaker identity, and stylistic variations through natural language prompting. The primary benefit is the ability to generate human-like, contextually rich audio content with unprecedented prompt-based control.

How It Works

This project is an IC-LoRA fine-tune of the LTX-2.3 3.3B audio-only model. Its core innovation lies in prompt-driven TTS, where detailed natural language prompts dictate speaker identity, emotion, delivery style, and even non-verbal sounds like laughs and sighs. An optional 10-second voice reference allows for timbre cloning. This approach offers a novel way to control speech synthesis nuances directly through descriptive text.

Quick Start & Requirements

  • Installation/Execution: Primarily via Python scripts (src/inference_server.py, src/inference.py, app.py).
  • Prerequisites: Requires a CUDA-enabled GPU. Peak VRAM usage is approximately 24 GB, necessary for running the DiT transformer and the 12B parameter text encoder. Models are auto-downloaded from Hugging Face on first run.
  • Links:

Highlighted Details

  • Expressive Prompting: Control speaker identity, emotion, delivery style, and non-verbal sounds (laughs, sighs, pauses) via detailed text prompts.
  • Voice Cloning: Simple 10-second audio reference enables target timbre cloning.
  • Robust Watermarking: Outputs are automatically watermarked with Resemble Perth, surviving common audio manipulations, with an option to disable (--no-watermark).
  • LoRA Fine-tuning: Supports training custom LoRAs directly on DramaBox for specialized speaker voices, language flavors, or styles.

Maintenance & Community

Developed by Resemble AI, with significant contributions acknowledged from the Lightricks team for the base LTX-2.3 model. No specific community channels (e.g., Discord, Slack) or detailed roadmap information are provided in the README.

Licensing & Compatibility

Distributed under the LTX-2 Community License, which is derived from the LTX-2.3 base model license. Users must consult the LICENSE file for specific terms, particularly regarding commercial use and redistribution, as community licenses can impose restrictions.

Limitations & Caveats

The system requires substantial VRAM (~24 GB peak), potentially limiting accessibility on consumer hardware. The LTX-2 Community License necessitates careful review for commercial deployment. The project notes that pre-merged checkpoints have yielded degraded output in their testing, recommending inference with LoRA loaded separately.

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
16
Star History
398 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.