DramaBox by resemble-ai

AI-powered expressive TTS with voice cloning

Created 2 months ago

456 stars

Top 65.4% on SourcePulse

Project Summary

Summary

DramaBox offers a highly expressive Text-to-Speech (TTS) system capable of voice cloning, built upon Lightricks' LTX-2.3 audio model. It targets developers and researchers seeking fine-grained control over synthesized speech, enabling nuanced emotional delivery, speaker identity, and stylistic variations through natural language prompting. The primary benefit is the ability to generate human-like, contextually rich audio content with unprecedented prompt-based control.

How It Works

This project is an IC-LoRA fine-tune of the LTX-2.3 3.3B audio-only model. Its core innovation lies in prompt-driven TTS, where detailed natural language prompts dictate speaker identity, emotion, delivery style, and even non-verbal sounds like laughs and sighs. An optional 10-second voice reference allows for timbre cloning. This approach offers a novel way to control speech synthesis nuances directly through descriptive text.

Quick Start & Requirements

Installation/Execution: Primarily via Python scripts (src/inference_server.py, src/inference.py, app.py).
Prerequisites: Requires a CUDA-enabled GPU. Peak VRAM usage is approximately 24 GB, necessary for running the DiT transformer and the 12B parameter text encoder. Models are auto-downloaded from Hugging Face on first run.
Links:
- Model Hub: ResembleAI/Dramabox
- Demo: ResembleAI/Dramabox (ZeroGPU)
- Base Model: Lightricks/LTX-2.3

Highlighted Details

Expressive Prompting: Control speaker identity, emotion, delivery style, and non-verbal sounds (laughs, sighs, pauses) via detailed text prompts.
Voice Cloning: Simple 10-second audio reference enables target timbre cloning.
Robust Watermarking: Outputs are automatically watermarked with Resemble Perth, surviving common audio manipulations, with an option to disable (--no-watermark).
LoRA Fine-tuning: Supports training custom LoRAs directly on DramaBox for specialized speaker voices, language flavors, or styles.

Maintenance & Community

Developed by Resemble AI, with significant contributions acknowledged from the Lightricks team for the base LTX-2.3 model. No specific community channels (e.g., Discord, Slack) or detailed roadmap information are provided in the README.

Licensing & Compatibility

Distributed under the LTX-2 Community License, which is derived from the LTX-2.3 base model license. Users must consult the LICENSE file for specific terms, particularly regarding commercial use and redistribution, as community licenses can impose restrictions.

Limitations & Caveats

The system requires substantial VRAM (~24 GB peak), potentially limiting accessibility on consumer hardware. The LTX-2 Community License necessitates careful review for commercial deployment. The project notes that pre-merged checkpoints have yielded degraded output in their testing, recommending inference with LoRA loaded separately.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days