auto-video-generateor by kuangdd2024

Automatic video generator from a topic

Created 1 year ago

832 stars

Top 42.7% on SourcePulse

Project Summary

This project provides an automated video generation system that takes a topic as input and produces a narrated explainer video. It's designed for users who want to quickly create educational or informational content without manual editing, leveraging LLMs for scriptwriting, TTS for narration, and text-to-image models for visuals.

How It Works

The system orchestrates a multi-stage pipeline: first, a large language model (LLM) generates a story or narration based on the user-provided topic. This text is then segmented, and a text-to-speech (TTS) engine converts it into audio. Concurrently, a text-to-image model generates visuals that match the narrative content. Finally, these audio and image assets are combined to produce the final explainer video. The project emphasizes a "free" approach, utilizing open-source or free-tier services where possible.

Quick Start & Requirements

Run via python main.py <parameter> (e.g., python main.py 4).
Access the Gradio interface at http://127.0.0.1:8000/.
Dependencies include Python, re, pyttsx3, pillow, and moviepy. Specific LLM and TTS/Text-to-Image services may require API keys or specific configurations (e.g., Baidu Qianfan).
A demo is available at http://avg.kddbot.com/.

Highlighted Details

Supports a "free" generation mode using free resources.
Offers a "review and generate" mode for editing text, audio, and image assets before final video creation.
Integrates with Baidu Qianfan for LLM (ERNIE series) and image generation.
Utilizes edge-tts as a free alternative for TTS.
Saves generated multimedia materials in a structured directory.

Maintenance & Community

The project has a significant number of stars on GitHub, indicating community interest. Links to Baidu Cloud documentation for their models are provided. There is a WeChat public account "趣聊机器人" for experiencing the demo.

Licensing & Compatibility

The README does not explicitly state a license. The project utilizes various components, some of which may have their own licenses (e.g., Baidu Qianfan services, edge-tts). Compatibility for commercial use would depend on the licenses of the underlying AI services and libraries used.

Limitations & Caveats

Video generation speed can be slow, and users may need to wait or use "load parameters" and "load resources" to access pre-generated content. Issues with text-based images might cause backend generation errors. The "review and generate" workflow requires completing a full generation pass first, including "create record," before editing individual assets. Some features like batch downloading and subtitle integration are still in the To-Do list.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days