Vlogger by Vchitect

AI system for minute-level vlog generation from user descriptions

Created 2 years ago

431 stars

Top 68.9% on SourcePulse

Project Summary

Vlogger is an AI system designed to generate minute-level video blogs (vlogs) from user descriptions. It targets users who need to create longer, narrative-driven video content, offering a structured approach to complex video generation tasks. The system aims to simplify vlog creation by mimicking human production workflows, enabling coherent and engaging long-form video output from simple text prompts.

How It Works

Vlogger employs a modular architecture, leveraging a Large Language Model (LLM) as a "Director" to orchestrate the generation process. This Director decomposes the vlog creation into four stages: Script generation, Actor selection, ShowMaker (video snippet generation), and Voicer (audio generation). The core innovation is the "ShowMaker," a novel video diffusion model that acts as a videographer. ShowMaker enhances spatial-temporal coherence by incorporating textual and visual prompts from the Script and Actor stages, utilizing a mixed training paradigm for both text-to-video (T2V) generation and prediction.

Quick Start & Requirements

Environment Setup: Use conda create -n vlogger python==3.10.11 and conda activate vlogger, then pip install -r requirements.txt.
Model Downloads: Requires Stable Diffusion v1.4, OpenCLIP-ViT-H-14, and the Vlogger ShowMaker checkpoint. All should be placed in a ./pretrained directory.
Dependencies: Python 3.10.11, PyTorch, Hugging Face libraries, and an OpenAI API key for LLM planning.
Inference:
- Script/Actor generation: python sample_scripts/vlog_write_script.py
- Vlog generation: python sample_scripts/vlog_read_script_sample.py
Resources: Requires downloading several large model checkpoints.
Links: Project Page, Hugging Face Models, Hugging Face Space.

Highlighted Details

Achieves state-of-the-art performance on zero-shot T2V generation and prediction.
Capable of generating vlogs exceeding 5 minutes with maintained coherence.
Integrates LLMs for directorial planning and foundation models for professional roles.
ShowMaker model enhances snippet coherence using text and image prompts.

Maintenance & Community

The project is associated with researchers from institutions like PJLab. Contact information for key contributors is provided. The code is built upon existing libraries like SEINE, LaVie, diffusers, and Stable Diffusion.

Licensing & Compatibility

The code is licensed under Apache-2.0. Model weights are fully open for academic research and permit free commercial usage. For commercial licensing inquiries, contact zhuangshaobin@pjlab.org.cn.

Limitations & Caveats

The system is not trained for realistic representation of people or events, and its use for generating demeaning, harmful, or violent content is prohibited. Users are solely liable for their actions.

Vlogger by Vchitect

Explore Similar Projects

t2v-turbo by Ji4chenLi

univa by univa-agent

AI-ContentCraft by nicekate

director_ai by freestylefly

memo by memoavatar

TaleStreamAI by zqq-nuli

auto-video-generateor by kuangdd2024

Pixelle-Video by AIDC-AI

story-flicks by alecm20

ShortGPT by RayVentura

MoneyPrinterPlus by ddean2009

MoneyPrinterTurbo by harry0703