ollama-helm  by otwld

Helm chart for deploying Ollama on Kubernetes

created 1 year ago
472 stars

Top 65.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This Helm chart provides a Kubernetes deployment for Ollama, enabling users to run large language models locally within a cluster. It targets Kubernetes users, particularly those needing GPU acceleration for LLM inference, and simplifies the setup and management of Ollama instances.

How It Works

The chart deploys Ollama as a Kubernetes Deployment, allowing for configurable resource allocation, GPU integration (NVIDIA and AMD), and persistent storage via PersistentVolumeClaims. It supports pre-loading models at startup and creating models from templates, offering flexibility in LLM deployment.

Quick Start & Requirements

  • Install:
    helm repo add otwld https://otwld.github.io/ollama-helm/
    helm repo update
    helm install ollama otwld/ollama --namespace ollama --create-namespace
    
  • Requirements: Kubernetes >= 1.16.0-0 (CPU), >= 1.26.0-0 (GPU). GPU support requires specific NVIDIA or AMD drivers and compatible hardware.
  • Docs: Ollama Documentation, Ollama-Helm Chart

Highlighted Details

  • GPU support for NVIDIA and AMD, including MIG for NVIDIA.
  • Ability to pull and run specified models on startup.
  • Support for creating models from templates or ConfigMaps.
  • Optional Ingress configuration for external access.
  • Persistent storage for Ollama data.

Maintenance & Community

  • Maintained by Jean Baptiste Detroyes and Nathan Tréhout.
  • Community support via OTWLD Discord and Ollama-Helm GitHub issues.

Licensing & Compatibility

  • The chart itself is typically licensed under a permissive license (e.g., Apache 2.0, though not explicitly stated in the README). Ollama's underlying license should be consulted for specific usage terms.

Limitations & Caveats

  • GPU support may vary depending on specific hardware and Kubernetes versions. Not all GPUs are guaranteed to be supported, especially AMD.
  • Upgrading from older chart versions (0.X.X to 1.X.X) requires migration of model configuration.
Health Check
Last commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
8
Issues (30d)
4
Star History
55 stars in the last 90 days

Explore Similar Projects

Starred by Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

dynamo by ai-dynamo

1.1%
5k
Inference framework for distributed generative AI model serving
created 5 months ago
updated 1 day ago
Feedback? Help us improve.