genv by run-ai

GPU environment/cluster manager with LLM support

Created 3 years ago

657 stars

Top 51.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Genv is an open-source system for managing GPU environments and clusters, designed to simplify GPU resource allocation and sharing for data scientists and ML engineers. It allows users to easily control, configure, monitor, and enforce GPU usage across machines or clusters, enabling efficient collaboration and resource utilization, particularly for LLM development and deployment.

How It Works

Genv operates by creating isolated GPU environments, inspired by tools like pyenv and Conda. Users can activate specific environments with defined GPU counts and memory allocations, abstracting away the underlying hardware. This approach allows seamless switching between GPU resources without modifying code, facilitating fair resource distribution, quota enforcement, and efficient sharing of GPUs for tasks like serving local LLMs.

Quick Start & Requirements

Install: pip install genv or conda install -c conda-forge genv
Prerequisites: NVIDIA GPU with compatible drivers. CUDA 11.4 is shown in the example, but specific version requirements are not explicitly stated beyond driver compatibility.
Setup: Installation is straightforward via package managers. Activating an environment involves commands like genv activate --name my-env --gpus 1.
Docs: Genv documentation site

Highlighted Details

Facilitates sharing GPUs among teammates and pooling resources from multiple machines.
Enforces GPU quotas (count and memory) for equitable resource allocation.
Integrates with Ollama for managing and serving local LLMs on cluster GPUs.
Offers monitoring capabilities via Grafana dashboards for administrators.

Maintenance & Community

Developed by Run.ai Labs.
Community support and feature discussion available on their Discord server.

Licensing & Compatibility

Licensed under AGPLv3. Run.ai intends for AGPLv3 obligations to be interpreted broadly, particularly regarding "work based on the Program" and "Corresponding Source."
The broad interpretation of AGPLv3 terms may impose significant obligations on derivative works and linked code, potentially restricting commercial use or integration into closed-source projects.

Limitations & Caveats

The AGPLv3 license, with its broad interpretation clause, presents a significant consideration for commercial adoption or integration into proprietary software due to potential copyleft requirements. Specific CUDA version compatibility beyond the example is not detailed.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days