glake by antgroup

GPU optimization library for memory management and IO

Created 2 years ago

496 stars

Top 62.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Ying Sheng

Coauthor of SGLang

Project Summary

GLake is an open-source library designed to address the GPU memory and I/O bottlenecks in large AI model training and inference. It targets AI researchers, engineers, and DevOps professionals seeking to maximize hardware utilization and improve performance. GLake offers significant throughput gains and memory savings by optimizing GPU memory management and data transmission.

How It Works

GLake employs a layered architecture for GPU memory and I/O optimization. It provides global and heterogeneous GPU memory pools with features like fragmentation optimization, multi-stream/process memory reuse, and memory security. The core optimization layer offers global allocation, multi-channel concurrency, tiering, memory deduplication, and KV-cache optimization. This approach allows for transparent integration with existing deep learning frameworks like PyTorch, enabling efficient memory pooling, sharing, and tiering across multiple GPUs and tasks.

Quick Start & Requirements

Installation: Primarily through replacing underlying libraries (e.g., libcuda.so, libc10_cuda.so) or following detailed integration steps. A whl package is available for PyTorch 1.13.1.
Prerequisites: NVIDIA GPUs, CUDA.
Resources: Performance claims suggest significant improvements in training throughput (up to 4x) and inference memory savings (up to 3x), with I/O acceleration of 3-12x.
Documentation: Tutorials for GMLake and multi-path I/O are available.

Highlighted Details

Reduces memory fragmentation by up to 27% and saves up to 25GB GPU memory.
Increases training throughput for a 10B model by up to 4x.
Enables cross-process/model memory deduplication for inference, saving up to 3x memory.
Accelerates CPU-GPU I/O transmission by up to 3x.

Maintenance & Community

Development is ongoing with previews of features like serverless GPU memory shrinking and LLM KV cache optimization. Community engagement is encouraged via WeChat.

Licensing & Compatibility

The README does not explicitly state the license type. Compatibility is confirmed with PyTorch 1.13.1.

Limitations & Caveats

Current testing and verification focus on PyTorch and NVIDIA GPUs, with ongoing efforts to support domestic AI cards and future interconnections like CXL. The project is actively developing new features, indicating a potential for ongoing changes and API evolution.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days