lmms-engine  by EvolvingLMMs-Lab

Unified engine for training large-scale multimodal AI models

Created 3 months ago
465 stars

Top 65.2% on SourcePulse

GitHubView on GitHub
Project Summary

A simple, unified multimodal model training engine designed for lean, flexible, and scalable development. It targets researchers and engineers needing an efficient framework to train complex multimodal models at scale, offering significant optimizations for both distributed training and memory/compute efficiency.

How It Works

LMMS-Engine employs a modular architecture using Factory and Builder patterns for extensibility. Its core strength lies in advanced optimization techniques, including PyTorch 2.0+ FSDP2 for distributed training, Ulysses Sequence Parallelism for handling ultra-long contexts, and Triton-fused kernels (Liger) for substantial memory reduction. It also integrates novel optimizers like Muon and efficient attention mechanisms such as Native Sparse Attention (NSA) and Flash Attention with unpadding, aiming to maximize Model FLOPs Utilization (MFU).

Quick Start & Requirements

Installation involves cloning the repository, synchronizing dependencies with uv sync, and optionally installing performance-enhancing packages like flash-attn and liger-kernel. Training is launched using torchrun or accelerate launch with a configuration YAML file.

  • Primary Install: git clone https://github.com/EvolvingLMMs-Lab/lmms-engine.git && cd lmms-engine && uv sync
  • Prerequisites: Python 3.11+, PyTorch 2.0+. Optional: CUDA for performance kernels.
  • Links: GitHub, Documentation

Highlighted Details

  • Supports over 19 model architectures across vision-language, diffusion, and language domains, including Qwen series, LLaVA, and Bagel.
  • Achieves high efficiency through techniques like Sequence Packing (35-40% MFU), Liger Kernels (30% memory reduction), and Ulysses Sequence Parallelism for 10K+ token contexts.
  • Provides comprehensive Model FLOPs Utilization (MFU) metrics for benchmarking various configurations.
  • Extensible via a component registry and a flexible training pipeline builder pattern.

Maintenance & Community

The project is developed by LMMs-Lab, with a website available at https://lmms-lab.com/. No specific community channels (like Discord/Slack) or public roadmaps are detailed in the README.

Licensing & Compatibility

This project is licensed under the Apache 2.0 License, which permits commercial use and modification.

Limitations & Caveats

Native Sparse Attention (NSA) is currently only supported for the BAGEL model. Some features, like Ulysses Sequence Parallel (USP) for the BAGEL model, are marked as "TBD" in the example table, indicating ongoing development or incomplete integration. As a v0.1 release, expect potential for rapid changes and evolving feature sets.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
35
Issues (30d)
2
Star History
443 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.