MegaTrain by DLYuanGod

Train massive LLMs and VLMs on a single GPU

Created 3 months ago

681 stars

Top 49.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Project Summary

Summary MegaTrain enables full-precision training of 100B+ parameter LLMs on a single GPU, addressing prohibitive hardware costs. Detailed in arXiv 2604.05091, it targets researchers and engineers needing to scale LLM training without massive distributed infrastructure, democratizing access to large model development.

How It Works A RAM-centric architecture stores parameters in host (CPU) RAM, treating GPUs as transient compute engines to overcome VRAM limitations. It employs double-buffered execution for overlapped CPU-GPU weight transfer, gradient checkpointing, and manual gradient computation. MegaTrain supports hybrid attention (linear + full) and MoE layers, automatically adapting to diverse model architectures.

Quick Start & Requirements Install via git clone https://github.com/DLYuanGod/MegaTrain.git && cd MegaTrain && pip install -e .. Requires Python 3.9+ and PyTorch 2.0+. Optional performance dependencies include flash-attn, flash-linear-attention, causal-conv1d, and deepspeed. Crucially, use scripts/calc_resource.py to determine optimal batch_size for specific hardware.

Highlighted Details

Enables training of 120B+ parameter models on a single GPU.
Supports any HuggingFace decoder-only LLM or VLM via AutoModel.
Handles hybrid attention (linear + full) and MoE layers automatically.
Claims 1.84x speedup over DeepSpeed ZeRO-3 on 14B models.
Features LlamaFactory-style data registry (Alpaca, ShareGPT, JSON, HF Hub).
Configuration via YAML files, with 25+ pre-made examples.

Maintenance & Community The README does not detail specific maintenance practices, notable contributors, sponsorships, or community channels like Discord or Slack.

Licensing & Compatibility Licensed under the Apache-2.0 License, permitting commercial use and integration into closed-source projects.

Limitations & Caveats Designed for decoder-only models; encoder-decoder architectures are unsupported. Accurate batch_size configuration via the resource calculator is critical to prevent OOM errors or inefficient utilization. The project appears research-oriented, with production stability not explicitly detailed.

MegaTrain by DLYuanGod

Explore Similar Projects

Amis by cPilot-GUI

ntransformer by xaskasdf

ffpa-attn by xlite-dev

DGX_Spark_Qwen3.5-122B-A10B-AR-INT4 by albond

tiny-vllm by jmaczan

atlas by Avarok-Cybersecurity

1Cat-vLLM by 1CatAI

marlin by IST-DASLab

colibri by JustVugg

picolm by RightNow-AI

rtp-llm by alibaba

fastllm by ztxz16