minisora  by mini-sora

Community initiative exploring Sora implementation and development

Created 1 year ago
1,263 stars

Top 31.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a community-driven initiative focused on exploring and replicating the technology behind OpenAI's Sora, a text-to-video generation model. It aims to provide accessible implementations and foster research into diffusion models for video generation, targeting researchers and developers interested in state-of-the-art video synthesis.

How It Works

The project centers on reproducing key research papers and technologies related to Sora, such as DiT (Diffusion Transformer). It leverages existing frameworks like XTuner for efficient sequence training and aims to develop GPU-friendly and training-efficient models. The approach involves a comprehensive review of diffusion models for video generation, from DDPM to advanced transformer-based architectures.

Quick Start & Requirements

  • Installation: Not explicitly detailed, but likely involves Python and PyTorch.
  • Requirements: The project aims for GPU-friendly operation, targeting configurations like 8x A100 80GB, 8x A6000 48GB, or RTX4090 24GB for training and inference. Specific requirements for reproducing DiT mention 2x A100.
  • Resources: The project is actively recruiting contributors familiar with OpenMMLab's MMEngine and DiT.
  • Links: MiniSora-DiT

Highlighted Details

  • Focus on reproducing DiT (Scalable Diffusion Models with Transformers).
  • Aims for GPU-friendly training and inference with moderate hardware.
  • Comprehensive survey of video generation models and related technologies.
  • Community-driven exploration of Sora's implementation and future directions.

Maintenance & Community

The project is driven by the MiniSora Community, with regular round-table discussions involving the Sora team and community members. It actively recruits contributors and provides links to WeChat groups for community engagement.

Licensing & Compatibility

The repository's license is not explicitly stated in the README.

Limitations & Caveats

The project is a community effort to replicate complex research; therefore, the fidelity and performance of reproduced models may vary. Specific implementation details and stability are subject to ongoing community development.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

DialoGPT by microsoft

0.0%
2k
Response generation model via large-scale pretraining
Created 6 years ago
Updated 3 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.1%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 5 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Zack Li Zack Li(Cofounder of Nexa AI), and
19 more.

LLaVA by haotian-liu

0.2%
24k
Multimodal assistant with GPT-4 level capabilities
Created 2 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
27 more.

ColossalAI by hpcaitech

0.0%
41k
AI system for large-scale parallel training
Created 4 years ago
Updated 1 day ago
Feedback? Help us improve.