VeOmni by ByteDance-Seed

Framework for scaling multimodal model training across accelerators

Created 9 months ago

1,520 stars

Top 27.0% on SourcePulse

View on GitHub

3 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Junyang Lin

Core Maintainer at Alibaba Qwen

Project Summary

VeOmni is a PyTorch-native framework designed for scaling large model training across diverse accelerators. It targets researchers and engineers working with single- or multi-modal models, offering flexibility and control by avoiding rigid trainer classes and exposing the full training logic.

How It Works

VeOmni emphasizes a modular, trainer-free design, allowing users to integrate custom components and maintain linear training scripts for maximum transparency. It leverages PyTorch's native functions for broad compatibility and performance, supporting advanced parallelism strategies like DeviceMesh, FSDP1/2, and experimental expert parallelism. Features such as activation offloading and checkpointing are integrated to manage memory and improve efficiency.

Quick Start & Requirements

Install: pip3 install veomni or pip3 install -e . for source installation.
Prerequisites: PyTorch, Python. Specific model training examples may require datasets (e.g., fineweb) and model weights (e.g., Qwen2.5-7B).
Quick Start: bash train.sh $TRAIN_SCRIPT $CONFIG.yaml
Examples: Detailed examples for Qwen2.5-VL, Qwen2.5, and Llama3 are provided.
Docs: VeOmni Best Practice

Highlighted Details

Supports various parallelism strategies: DeviceMesh, FSDP1/2, Expert parallelism (experimental), Sequence parallelism.
Integrates memory-saving techniques: Activation offloading, Activation checkpointing.
Offers distributed checkpointing via ByteCheckpoint for efficient saving and merging.
Supports a range of models including DeepSeek, Llama 3, and Qwen 2/2.5 variants.

Maintenance & Community

Project released on April 3, 2025.
Contributions are welcome via CONTRIBUTING.md.
Roadmap to be updated.

Licensing & Compatibility

Licensed under Apache License 2.0.
Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The "veScale" component for FSDP is not yet available. Some advanced features like expert parallelism are marked as experimental. Performance benchmarks are pending a technical report.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

111 stars in the last 30 days