text2video  by bravekingzhang

CLI tool for text-to-video generation

created 2 years ago
1,030 stars

Top 37.1% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a one-click tool to convert text into videos, aimed at enabling visualized reading of novels. It addresses the challenge of creating engaging video content from static text by integrating multiple AI models for image generation, speech synthesis, and prompt engineering.

How It Works

The system segments text into sentences using punctuation. Each sentence is then translated to English for better image generation quality, leveraging Youdao Translate. A large language model generates Midjourney-style prompts from the translated text. Stable Diffusion and Hugging Face models create images based on these prompts, while Edge-TTS generates speech. OpenCV merges the images into a video with the original sentence as a subtitle at the bottom. Audio timing controls the duration of each visual frame. Finally, FFmpeg combines the audio and video into a complete output.

Quick Start & Requirements

  • Install/Run: docker-compose up --build
  • Prerequisites: macOS, Python 3.10.12, FFmpeg (version 6.0 confirmed).
  • API Keys: OpenAI API key (with optional proxy URL) and Hugging Face API token are required for prompt generation and image creation, respectively.
  • Setup: Local development requires pip install -r requirements.txt.

Highlighted Details

  • Leverages Stable Diffusion for image generation and Edge-TTS for speech.
  • Utilizes LLMs to generate Midjourney-like prompts for enhanced image quality.
  • Integrates Youdao Translate for English translation of Chinese text to improve image generation.
  • Outputs MP4 videos with synchronized audio and subtitles.

Maintenance & Community

The project is maintained by bravekingzhang. Contact information for the author includes a WeChat public account ("老码沉思录").

Licensing & Compatibility

  • License: MIT.
  • Compatibility: The MIT license permits commercial use and linking with closed-source projects.

Limitations & Caveats

The project notes that Chinese text-to-image generation quality is not optimal, necessitating translation. Compatibility with environments other than macOS with Python 3.10.12 may present issues.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.