VeOmni  by ByteDance-Seed

Framework for scaling multimodal model training across accelerators

Created 5 months ago
1,086 stars

Top 35.0% on SourcePulse

GitHubView on GitHub
Project Summary

VeOmni is a PyTorch-native framework designed for scaling large model training across diverse accelerators. It targets researchers and engineers working with single- or multi-modal models, offering flexibility and control by avoiding rigid trainer classes and exposing the full training logic.

How It Works

VeOmni emphasizes a modular, trainer-free design, allowing users to integrate custom components and maintain linear training scripts for maximum transparency. It leverages PyTorch's native functions for broad compatibility and performance, supporting advanced parallelism strategies like DeviceMesh, FSDP1/2, and experimental expert parallelism. Features such as activation offloading and checkpointing are integrated to manage memory and improve efficiency.

Quick Start & Requirements

  • Install: pip3 install veomni or pip3 install -e . for source installation.
  • Prerequisites: PyTorch, Python. Specific model training examples may require datasets (e.g., fineweb) and model weights (e.g., Qwen2.5-7B).
  • Quick Start: bash train.sh $TRAIN_SCRIPT $CONFIG.yaml
  • Examples: Detailed examples for Qwen2.5-VL, Qwen2.5, and Llama3 are provided.
  • Docs: VeOmni Best Practice

Highlighted Details

  • Supports various parallelism strategies: DeviceMesh, FSDP1/2, Expert parallelism (experimental), Sequence parallelism.
  • Integrates memory-saving techniques: Activation offloading, Activation checkpointing.
  • Offers distributed checkpointing via ByteCheckpoint for efficient saving and merging.
  • Supports a range of models including DeepSeek, Llama 3, and Qwen 2/2.5 variants.

Maintenance & Community

  • Project released on April 3, 2025.
  • Contributions are welcome via CONTRIBUTING.md.
  • Roadmap to be updated.

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The "veScale" component for FSDP is not yet available. Some advanced features like expert parallelism are marked as experimental. Performance benchmarks are pending a technical report.

Health Check
Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)
9
Issues (30d)
16
Star History
220 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
13 more.

torchtitan by pytorch

0.7%
4k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 19 hours ago
Feedback? Help us improve.