CLI tool for text-to-video generation
Top 37.1% on sourcepulse
This project provides a one-click tool to convert text into videos, aimed at enabling visualized reading of novels. It addresses the challenge of creating engaging video content from static text by integrating multiple AI models for image generation, speech synthesis, and prompt engineering.
How It Works
The system segments text into sentences using punctuation. Each sentence is then translated to English for better image generation quality, leveraging Youdao Translate. A large language model generates Midjourney-style prompts from the translated text. Stable Diffusion and Hugging Face models create images based on these prompts, while Edge-TTS generates speech. OpenCV merges the images into a video with the original sentence as a subtitle at the bottom. Audio timing controls the duration of each visual frame. Finally, FFmpeg combines the audio and video into a complete output.
Quick Start & Requirements
docker-compose up --build
pip install -r requirements.txt
.Highlighted Details
Maintenance & Community
The project is maintained by bravekingzhang. Contact information for the author includes a WeChat public account ("老码沉思录").
Licensing & Compatibility
Limitations & Caveats
The project notes that Chinese text-to-image generation quality is not optimal, necessitating translation. Compatibility with environments other than macOS with Python 3.10.12 may present issues.
1 year ago
1+ week