Discover and explore top open-source AI tools and projects—updated daily.
Interactive world generation from text, image, or video
Top 92.0% on SourcePulse
Yume is an interactive world generation model designed for creating realistic and dynamic visual content from text, image, or video inputs. It targets researchers and developers in AI-driven content creation, offering a framework for long-form video generation with fine-grained control over camera and character actions.
How It Works
Yume leverages a distillation recipe for video Diffusion Transformer (DiT) models, implementing a FramePack-like training code. It supports long video generation and distributed training (DDP/FSDP) for efficient sampling. The model allows for interactive control through text prompts that can specify camera movement and character actions, enabling dynamic scene generation.
Quick Start & Requirements
pip install -r requirements.txt
and pip install .
after code modifications.scripts/inference/sample_jpg.sh
(image-to-video) and scripts/inference/sample.sh
(general video) are provided.Highlighted Details
Maintenance & Community
The project is associated with an arXiv paper (2507.17744) and a Hugging Face model repository. Contributions are welcomed.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Training is resource-intensive, requiring a minimum of 16 A100 GPUs. The project is under active development with a stated plan for FP8 support and quantized models.
2 weeks ago
Inactive