glake  by antgroup

GPU optimization library for memory management and IO

created 2 years ago
471 stars

Top 65.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

GLake is an open-source library designed to address the GPU memory and I/O bottlenecks in large AI model training and inference. It targets AI researchers, engineers, and DevOps professionals seeking to maximize hardware utilization and improve performance. GLake offers significant throughput gains and memory savings by optimizing GPU memory management and data transmission.

How It Works

GLake employs a layered architecture for GPU memory and I/O optimization. It provides global and heterogeneous GPU memory pools with features like fragmentation optimization, multi-stream/process memory reuse, and memory security. The core optimization layer offers global allocation, multi-channel concurrency, tiering, memory deduplication, and KV-cache optimization. This approach allows for transparent integration with existing deep learning frameworks like PyTorch, enabling efficient memory pooling, sharing, and tiering across multiple GPUs and tasks.

Quick Start & Requirements

  • Installation: Primarily through replacing underlying libraries (e.g., libcuda.so, libc10_cuda.so) or following detailed integration steps. A whl package is available for PyTorch 1.13.1.
  • Prerequisites: NVIDIA GPUs, CUDA.
  • Resources: Performance claims suggest significant improvements in training throughput (up to 4x) and inference memory savings (up to 3x), with I/O acceleration of 3-12x.
  • Documentation: Tutorials for GMLake and multi-path I/O are available.

Highlighted Details

  • Reduces memory fragmentation by up to 27% and saves up to 25GB GPU memory.
  • Increases training throughput for a 10B model by up to 4x.
  • Enables cross-process/model memory deduplication for inference, saving up to 3x memory.
  • Accelerates CPU-GPU I/O transmission by up to 3x.

Maintenance & Community

Development is ongoing with previews of features like serverless GPU memory shrinking and LLM KV cache optimization. Community engagement is encouraged via WeChat.

Licensing & Compatibility

The README does not explicitly state the license type. Compatibility is confirmed with PyTorch 1.13.1.

Limitations & Caveats

Current testing and verification focus on PyTorch and NVIDIA GPUs, with ongoing efforts to support domestic AI cards and future interconnections like CXL. The project is actively developing new features, indicating a potential for ongoing changes and API evolution.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.