VeOmni  by ByteDance-Seed

Framework for scaling multimodal model training across accelerators

Created 11 months ago
1,657 stars

Top 25.0% on SourcePulse

GitHubView on GitHub
Project Summary

VeOmni is a PyTorch-native framework designed for scaling large model training across diverse accelerators. It targets researchers and engineers working with single- or multi-modal models, offering flexibility and control by avoiding rigid trainer classes and exposing the full training logic.

How It Works

VeOmni emphasizes a modular, trainer-free design, allowing users to integrate custom components and maintain linear training scripts for maximum transparency. It leverages PyTorch's native functions for broad compatibility and performance, supporting advanced parallelism strategies like DeviceMesh, FSDP1/2, and experimental expert parallelism. Features such as activation offloading and checkpointing are integrated to manage memory and improve efficiency.

Quick Start & Requirements

  • Install: pip3 install veomni or pip3 install -e . for source installation.
  • Prerequisites: PyTorch, Python. Specific model training examples may require datasets (e.g., fineweb) and model weights (e.g., Qwen2.5-7B).
  • Quick Start: bash train.sh $TRAIN_SCRIPT $CONFIG.yaml
  • Examples: Detailed examples for Qwen2.5-VL, Qwen2.5, and Llama3 are provided.
  • Docs: VeOmni Best Practice

Highlighted Details

  • Supports various parallelism strategies: DeviceMesh, FSDP1/2, Expert parallelism (experimental), Sequence parallelism.
  • Integrates memory-saving techniques: Activation offloading, Activation checkpointing.
  • Offers distributed checkpointing via ByteCheckpoint for efficient saving and merging.
  • Supports a range of models including DeepSeek, Llama 3, and Qwen 2/2.5 variants.

Maintenance & Community

  • Project released on April 3, 2025.
  • Contributions are welcome via CONTRIBUTING.md.
  • Roadmap to be updated.

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The "veScale" component for FSDP is not yet available. Some advanced features like expert parallelism are marked as experimental. Performance benchmarks are pending a technical report.

Health Check
Last Commit

19 hours ago

Responsiveness

Inactive

Pull Requests (30d)
75
Issues (30d)
12
Star History
85 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
791
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
7 more.

lingua by facebookresearch

0.0%
5k
LLM research codebase for training and inference
Created 1 year ago
Updated 7 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Woosuk Kwon Woosuk Kwon(Coauthor of vLLM), and
15 more.

torchtitan by pytorch

0.2%
5k
PyTorch platform for generative AI model training research
Created 2 years ago
Updated 20 hours ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
23 more.

Megatron-LM by NVIDIA

0.3%
15k
Framework for training transformer models at scale
Created 7 years ago
Updated 19 hours ago
Feedback? Help us improve.