Discover and explore top open-source AI tools and projects—updated daily.
DLYuanGodTrain massive LLMs and VLMs on a single GPU
New!
Top 87.4% on SourcePulse
Summary MegaTrain enables full-precision training of 100B+ parameter LLMs on a single GPU, addressing prohibitive hardware costs. Detailed in arXiv 2604.05091, it targets researchers and engineers needing to scale LLM training without massive distributed infrastructure, democratizing access to large model development.
How It Works A RAM-centric architecture stores parameters in host (CPU) RAM, treating GPUs as transient compute engines to overcome VRAM limitations. It employs double-buffered execution for overlapped CPU-GPU weight transfer, gradient checkpointing, and manual gradient computation. MegaTrain supports hybrid attention (linear + full) and MoE layers, automatically adapting to diverse model architectures.
Quick Start & Requirements
Install via git clone https://github.com/DLYuanGod/MegaTrain.git && cd MegaTrain && pip install -e .. Requires Python 3.9+ and PyTorch 2.0+. Optional performance dependencies include flash-attn, flash-linear-attention, causal-conv1d, and deepspeed. Crucially, use scripts/calc_resource.py to determine optimal batch_size for specific hardware.
Highlighted Details
AutoModel.Maintenance & Community The README does not detail specific maintenance practices, notable contributors, sponsorships, or community channels like Discord or Slack.
Licensing & Compatibility Licensed under the Apache-2.0 License, permitting commercial use and integration into closed-source projects.
Limitations & Caveats
Designed for decoder-only models; encoder-decoder architectures are unsupported. Accurate batch_size configuration via the resource calculator is critical to prevent OOM errors or inefficient utilization. The project appears research-oriented, with production stability not explicitly detailed.
9 hours ago
Inactive
alibaba
ztxz16
Dao-AILab
unslothai