MovieAgent by showlab

Automated movie generation from scripts

Created 10 months ago

276 stars

Top 93.9% on SourcePulse

Project Summary

MovieAgent addresses the limitations in current long-form video generation frameworks, which often lack automated planning and require significant manual intervention for storyline, scene composition, and cinematography. It offers automated movie generation via multi-agent Chain of Thought (CoT) planning, enabling the creation of multi-scene, multi-shot videos with coherent narratives, character consistency, and synchronized audio/subtitles. This project is designed for researchers and developers in AI video generation, aiming to streamline the production pipeline and bridge the gap between AI capabilities and high-quality filmmaking.

How It Works

The project's core innovation is a multi-agent CoT planning framework that simulates roles like director, screenwriter, and storyboard artist. These LLM agents engage in a hierarchical CoT reasoning process to automatically structure scenes, camera settings, and cinematography. This approach facilitates automated storyline development and scene composition, moving beyond simple frame generation towards narrative coherence and reduced human effort in filmmaking.

Quick Start & Requirements

Installation involves cloning the repository and setting up a Python 3.8 environment using Conda (conda create -n MovieAgent python=3.8, conda activate MovieAgent). Dependencies are managed via pip install -r requirements.txt. Specific models like ROICtrl necessitate CUDA 12.1 and PyTorch 2.4. Users must prepare movie scripts and character assets (photos, audio) in a predefined directory structure. Configuration of API keys and model names is handled through movie_agent/script/run.sh. Some models, such as StoryDiffusion and ROICtrl, may require manual setup or downloading pre-trained weights.

Highlighted Details

Features training-free inference capabilities.
Generates coherent, multi-scene, multi-shot long-form videos.
Ensures character consistency, synchronized subtitles, and stable audio throughout generated content.
Employs hierarchical CoT reasoning for automated directorial decisions and scene structuring.
Supports integration with LLMs like GPT4-o and generation models including ROICtrl, SVD, and HunyuanVideo_I2V.

Maintenance & Community

The provided documentation does not detail specific maintainers, community channels (e.g., Discord, Slack), or a public roadmap. A note indicating "Rep initialization (No code)" suggests potential early-stage development or incomplete components within the repository.

Licensing & Compatibility

The README does not specify the software license. Consequently, its compatibility for commercial use or integration within closed-source projects cannot be determined from the available information.

Limitations & Caveats

Certain advanced features, particularly those involving specific image and video generation models like ROICtrl and StoryDiffusion, require manual configuration and may depend on particular hardware and software environments (e.g., CUDA 12.1). The framework's reliance on external models and API keys introduces potential setup complexities and points of failure.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days