lmms-engine by EvolvingLMMs-Lab

Unified engine for training large-scale multimodal AI models

Created 5 months ago

693 stars

Top 49.3% on SourcePulse

Project Summary

A simple, unified multimodal model training engine designed for lean, flexible, and scalable development. It targets researchers and engineers needing an efficient framework to train complex multimodal models at scale, offering significant optimizations for both distributed training and memory/compute efficiency.

How It Works

LMMS-Engine employs a modular architecture using Factory and Builder patterns for extensibility. Its core strength lies in advanced optimization techniques, including PyTorch 2.0+ FSDP2 for distributed training, Ulysses Sequence Parallelism for handling ultra-long contexts, and Triton-fused kernels (Liger) for substantial memory reduction. It also integrates novel optimizers like Muon and efficient attention mechanisms such as Native Sparse Attention (NSA) and Flash Attention with unpadding, aiming to maximize Model FLOPs Utilization (MFU).

Quick Start & Requirements

Installation involves cloning the repository, synchronizing dependencies with uv sync, and optionally installing performance-enhancing packages like flash-attn and liger-kernel. Training is launched using torchrun or accelerate launch with a configuration YAML file.

Primary Install: git clone https://github.com/EvolvingLMMs-Lab/lmms-engine.git && cd lmms-engine && uv sync
Prerequisites: Python 3.11+, PyTorch 2.0+. Optional: CUDA for performance kernels.
Links: GitHub, Documentation

Highlighted Details

Supports over 19 model architectures across vision-language, diffusion, and language domains, including Qwen series, LLaVA, and Bagel.
Achieves high efficiency through techniques like Sequence Packing (35-40% MFU), Liger Kernels (30% memory reduction), and Ulysses Sequence Parallelism for 10K+ token contexts.
Provides comprehensive Model FLOPs Utilization (MFU) metrics for benchmarking various configurations.
Extensible via a component registry and a flexible training pipeline builder pattern.

Maintenance & Community

The project is developed by LMMs-Lab, with a website available at https://lmms-lab.com/. No specific community channels (like Discord/Slack) or public roadmaps are detailed in the README.

Licensing & Compatibility

This project is licensed under the Apache 2.0 License, which permits commercial use and modification.

Limitations & Caveats

Native Sparse Attention (NSA) is currently only supported for the BAGEL model. Some features, like Ulysses Sequence Parallel (USP) for the BAGEL model, are marked as "TBD" in the example table, indicating ongoing development or incomplete integration. As a v0.1 release, expect potential for rapid changes and evolving feature sets.

lmms-engine by EvolvingLMMs-Lab

Explore Similar Projects

cobra by h-zhao1997

native-sparse-attention-triton by XunhaoLai

LongCat-Flash-Omni by meituan-longcat

ArchScale by microsoft

Paper-Replications by YuvrajSingh-mist

native-sparse-attention-pytorch by lucidrains

EasyContext by jzhang38

safari by HazyResearch

long-context-attention by feifeibear

torchscale by microsoft

MiniMax-01 by MiniMax-AI

unilm by microsoft