VeOmni  by ByteDance-Seed

Framework for scaling multimodal model training across accelerators

created 4 months ago
396 stars

Top 74.0% on sourcepulse

GitHubView on GitHub
Project Summary

VeOmni is a PyTorch-native framework designed for scaling large model training across diverse accelerators. It targets researchers and engineers working with single- or multi-modal models, offering flexibility and control by avoiding rigid trainer classes and exposing the full training logic.

How It Works

VeOmni emphasizes a modular, trainer-free design, allowing users to integrate custom components and maintain linear training scripts for maximum transparency. It leverages PyTorch's native functions for broad compatibility and performance, supporting advanced parallelism strategies like DeviceMesh, FSDP1/2, and experimental expert parallelism. Features such as activation offloading and checkpointing are integrated to manage memory and improve efficiency.

Quick Start & Requirements

  • Install: pip3 install veomni or pip3 install -e . for source installation.
  • Prerequisites: PyTorch, Python. Specific model training examples may require datasets (e.g., fineweb) and model weights (e.g., Qwen2.5-7B).
  • Quick Start: bash train.sh $TRAIN_SCRIPT $CONFIG.yaml
  • Examples: Detailed examples for Qwen2.5-VL, Qwen2.5, and Llama3 are provided.
  • Docs: VeOmni Best Practice

Highlighted Details

  • Supports various parallelism strategies: DeviceMesh, FSDP1/2, Expert parallelism (experimental), Sequence parallelism.
  • Integrates memory-saving techniques: Activation offloading, Activation checkpointing.
  • Offers distributed checkpointing via ByteCheckpoint for efficient saving and merging.
  • Supports a range of models including DeepSeek, Llama 3, and Qwen 2/2.5 variants.

Maintenance & Community

  • Project released on April 3, 2025.
  • Contributions are welcome via CONTRIBUTING.md.
  • Roadmap to be updated.

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The "veScale" component for FSDP is not yet available. Some advanced features like expert parallelism are marked as experimental. Performance benchmarks are pending a technical report.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
9
Issues (30d)
1
Star History
95 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Zhiqiang Xie Zhiqiang Xie(Author of SGLang).

veScale by volcengine

0.1%
839
PyTorch-native framework for LLM training
created 1 year ago
updated 3 weeks ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Zhuohan Li Zhuohan Li(Author of vLLM), and
6 more.

torchtitan by pytorch

0.9%
4k
PyTorch platform for generative AI model training research
created 1 year ago
updated 22 hours ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
16 more.

tinygrad by tinygrad

0.1%
30k
Minimalist deep learning framework for education and exploration
created 4 years ago
updated 18 hours ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 1 day ago
Feedback? Help us improve.