genv  by run-ai

GPU environment/cluster manager with LLM support

created 2 years ago
624 stars

Top 53.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Genv is an open-source system for managing GPU environments and clusters, designed to simplify GPU resource allocation and sharing for data scientists and ML engineers. It allows users to easily control, configure, monitor, and enforce GPU usage across machines or clusters, enabling efficient collaboration and resource utilization, particularly for LLM development and deployment.

How It Works

Genv operates by creating isolated GPU environments, inspired by tools like pyenv and Conda. Users can activate specific environments with defined GPU counts and memory allocations, abstracting away the underlying hardware. This approach allows seamless switching between GPU resources without modifying code, facilitating fair resource distribution, quota enforcement, and efficient sharing of GPUs for tasks like serving local LLMs.

Quick Start & Requirements

  • Install: pip install genv or conda install -c conda-forge genv
  • Prerequisites: NVIDIA GPU with compatible drivers. CUDA 11.4 is shown in the example, but specific version requirements are not explicitly stated beyond driver compatibility.
  • Setup: Installation is straightforward via package managers. Activating an environment involves commands like genv activate --name my-env --gpus 1.
  • Docs: Genv documentation site

Highlighted Details

  • Facilitates sharing GPUs among teammates and pooling resources from multiple machines.
  • Enforces GPU quotas (count and memory) for equitable resource allocation.
  • Integrates with Ollama for managing and serving local LLMs on cluster GPUs.
  • Offers monitoring capabilities via Grafana dashboards for administrators.

Maintenance & Community

  • Developed by Run.ai Labs.
  • Community support and feature discussion available on their Discord server.

Licensing & Compatibility

  • Licensed under AGPLv3. Run.ai intends for AGPLv3 obligations to be interpreted broadly, particularly regarding "work based on the Program" and "Corresponding Source."
  • The broad interpretation of AGPLv3 terms may impose significant obligations on derivative works and linked code, potentially restricting commercial use or integration into closed-source projects.

Limitations & Caveats

The AGPLv3 license, with its broad interpretation clause, presents a significant consideration for commercial adoption or integration into proprietary software due to potential copyleft requirements. Specific CUDA version compatibility beyond the example is not detailed.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 3 days ago
Feedback? Help us improve.