Discover and explore top open-source AI tools and projects—updated daily.
Kevin-thuAI for multi-shot long video storytelling
New!
Top 54.7% on SourcePulse
StoryMem addresses the challenge of generating coherent, minute-long, multi-shot narrative videos from text-based story scripts. It targets researchers and developers in AI video generation, offering a solution for creating visually appealing videos with consistent characters and cinematic quality across sequential shots. The primary benefit is enabling automated, high-fidelity storytelling through advanced diffusion models.
How It Works
StoryMem employs a memory-conditioned, shot-by-shot video diffusion approach. It initiates generation using a Text-to-Video (T2V) model to create the first shot, establishing an initial memory state. Subsequent shots are generated using a fine-tuned Memory-to-Video (M2V) LoRA model, which leverages the accumulated memory from previous shots. This iterative process, where memory is updated after each generated shot, ensures high character coherence and visual consistency throughout the narrative, leading to cinematic quality.
Quick Start & Requirements
git clone --single-branch --branch main git@github.com:Kevin-thu/StoryMem.git), navigate into the directory, create and activate a Conda environment (conda create -n storymem python=3.11, conda activate storymem), and install dependencies (pip install -r requirements.txt, pip install flash_attn).bash run_example.sh../story subfolder).Highlighted Details
Maintenance & Community
The project acknowledges its foundation on Wan2.2. No specific details regarding active maintenance, community channels (e.g., Discord, Slack), or a public roadmap are provided in the README.
Licensing & Compatibility
The README does not specify a software license. This absence presents a significant adoption blocker, as the terms for use, modification, and distribution are undefined, potentially restricting commercial or even academic applications.
Limitations & Caveats
The README does not explicitly list known limitations, bugs, or alpha status. However, the nature of advanced AI video generation implies substantial computational resource requirements (e.g., high-end GPUs, significant VRAM) and potentially long generation times, which are typical caveats for such systems. The lack of a defined license is a critical caveat for any potential adoption.
2 weeks ago
Inactive
SkyworkAI