YUME  by stdstu12

Interactive world generation from text, image, or video

Created 1 month ago
284 stars

Top 92.0% on SourcePulse

GitHubView on GitHub
Project Summary

Yume is an interactive world generation model designed for creating realistic and dynamic visual content from text, image, or video inputs. It targets researchers and developers in AI-driven content creation, offering a framework for long-form video generation with fine-grained control over camera and character actions.

How It Works

Yume leverages a distillation recipe for video Diffusion Transformer (DiT) models, implementing a FramePack-like training code. It supports long video generation and distributed training (DDP/FSDP) for efficient sampling. The model allows for interactive control through text prompts that can specify camera movement and character actions, enabling dynamic scene generation.

Quick Start & Requirements

Highlighted Details

  • Supports image-to-video and text-to-video generation.
  • Interactive control via text prompts for camera and character actions.
  • Distillation recipes for video DiT models.
  • DDP/FSDP sampling support for long video generation.
  • Training requires a minimum of 16 A100 GPUs.

Maintenance & Community

The project is associated with an arXiv paper (2507.17744) and a Hugging Face model repository. Contributions are welcomed.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Training is resource-intensive, requiring a minimum of 16 A100 GPUs. The project is under active development with a stated plan for FP8 support and quantized models.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
76 stars in the last 30 days

Explore Similar Projects

Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
1 more.

Lumina-T2X by Alpha-VLLM

0%
2k
Framework for text-to-any modality generation
Created 1 year ago
Updated 6 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luca Antiga Luca Antiga(CTO of Lightning AI), and
2 more.

mmagic by open-mmlab

0.1%
7k
AIGC toolbox for image/video editing and generation
Created 6 years ago
Updated 1 year ago
Feedback? Help us improve.