text2video by bravekingzhang

CLI tool for text-to-video generation

Created 2 years ago

1,052 stars

Top 35.9% on SourcePulse

Project Summary

This project provides a one-click tool to convert text into videos, aimed at enabling visualized reading of novels. It addresses the challenge of creating engaging video content from static text by integrating multiple AI models for image generation, speech synthesis, and prompt engineering.

How It Works

The system segments text into sentences using punctuation. Each sentence is then translated to English for better image generation quality, leveraging Youdao Translate. A large language model generates Midjourney-style prompts from the translated text. Stable Diffusion and Hugging Face models create images based on these prompts, while Edge-TTS generates speech. OpenCV merges the images into a video with the original sentence as a subtitle at the bottom. Audio timing controls the duration of each visual frame. Finally, FFmpeg combines the audio and video into a complete output.

Quick Start & Requirements

Install/Run: docker-compose up --build
Prerequisites: macOS, Python 3.10.12, FFmpeg (version 6.0 confirmed).
API Keys: OpenAI API key (with optional proxy URL) and Hugging Face API token are required for prompt generation and image creation, respectively.
Setup: Local development requires pip install -r requirements.txt.

Highlighted Details

Leverages Stable Diffusion for image generation and Edge-TTS for speech.
Utilizes LLMs to generate Midjourney-like prompts for enhanced image quality.
Integrates Youdao Translate for English translation of Chinese text to improve image generation.
Outputs MP4 videos with synchronized audio and subtitles.

Maintenance & Community

The project is maintained by bravekingzhang. Contact information for the author includes a WeChat public account ("老码沉思录").

Licensing & Compatibility

License: MIT.
Compatibility: The MIT license permits commercial use and linking with closed-source projects.

Limitations & Caveats

The project notes that Chinese text-to-image generation quality is not optimal, necessitating translation. Compatibility with environments other than macOS with Python 3.10.12 may present issues.

text2video by bravekingzhang

Explore Similar Projects

kandinsky-5 by kandinskylab

ArxivPapers by imelnyk

Auto-YouTube-Shorts-Maker by Binary-Bytes

Lumina-T2X by Alpha-VLLM

auto-video-generateor by kuangdd2024

Chenyme-AAVT by chenyme

deep-daze by lucidrains

short-video-maker by gyoridavid

open-chat-video-editor by SCUTlihaoyu

story-flicks by alecm20

VQGAN-CLIP by nerdyrodent

ShortGPT by RayVentura