TaleStreamAI by zqq-nuli

Automated AI novel to video workflow

Created 9 months ago

557 stars

Top 57.4% on SourcePulse

Project Summary

This project provides an automated workflow for generating AI-powered novel promotional videos, transforming raw novel content into engaging video summaries. It targets content creators and enthusiasts looking to streamline the production of promotional material for novels, leveraging multiple AI models for various stages of the pipeline.

How It Works

The workflow orchestrates a series of Python scripts, each responsible for a specific task: fetching novel content, generating scene storyboards using Gemini, refining prompts with DeepSeek, creating images with Stable Diffusion (via aaaki forge), synthesizing audio with CosyVoice2, generating subtitles with Whisper, and finally assembling video clips using FFmpeg with GPU acceleration. This modular approach allows for flexibility and the integration of different AI models at each step.

Quick Start & Requirements

Installation: Uses uv for dependency management. Install with pip install uv, create a virtual environment with uv venv --python 3.12, activate it, and install requirements with uv add -r requirements.txt.
Prerequisites: Python >= 3.10, NVIDIA GPU with CUDA support (e.g., CUDA 11.8 or 12.6 recommended for PyTorch installation), FFmpeg (GPU accelerated version recommended), and API keys for services like Gemini and CosyVoice2. A .env file is required for configuration.
Setup: Requires configuring API keys and potentially downloading specific AI models. The setup time depends on download speeds and model sizes.
Documentation: Links to official quick-start guides are not explicitly provided, but the README details the step-by-step execution of each script.

Highlighted Details

Supports multiple AI models for text generation (Gemini, DeepSeek), image generation (aaaki forge), and audio synthesis (CosyVoice2).
Includes automatic subtitle generation using Whisper.
Employs GPU-accelerated FFmpeg for faster video processing.
Workflow can be executed sequentially or by running main.py which orchestrates the entire process.

Maintenance & Community

No specific information on contributors, sponsorships, or community channels (like Discord/Slack) is provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not mentioned.

Limitations & Caveats

The project requires specific API keys and may need manual adjustments for high-concurrency Gemini usage. The README does not detail compatibility with different operating systems beyond the implied Linux/macOS shell commands and Windows .venv\Scripts\activate. The Whisper model size selection impacts VRAM requirements, with larger models needing up to 10GB.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

33 stars in the last 30 days