ramalama by containers

CLI tool for simplifying local AI model serving via containers

Created 1 year ago

2,508 stars

Top 18.3% on SourcePulse

1 Expert Loves This Project

tjbck

Founder of Open WebUI

Project Summary

RamaLama is an open-source developer tool designed to simplify the local serving and production inference of AI models using OCI containers. It targets developers and researchers who want to manage AI models efficiently without complex host system configurations, offering a secure, containerized environment for model execution.

How It Works

RamaLama leverages container engines like Podman or Docker to pull OCI images tailored to the host's detected hardware (CPU, NVIDIA CUDA, AMD ROCm, Apple Silicon, etc.). This approach abstracts away the need for manual dependency management and environment setup on the host machine. Models are then pulled from various registries (Hugging Face, Ollama, OCI) and run within isolated, rootless containers, enhancing security through read-only mounts, network isolation, and capability dropping.

Quick Start & Requirements

Install: pip install ramalama or via Fedora 40+ (sudo dnf install python3-ramalama). macOS users can use the install script: curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | bash.
Prerequisites: Container engine (Podman or Docker) recommended. Specific hardware acceleration (NVIDIA GPUs, AMD GPUs, Apple Silicon) may require additional host configuration as per ramalama-cuda(7) documentation.
Setup: Initial container image download can take time.
Docs: https://ramalama.ai

Highlighted Details

Supports multiple AI model registries: Hugging Face, Ollama, and OCI-compliant registries.
Employs robust security measures: container isolation, read-only mounts, no network access, auto-cleanup, dropped Linux capabilities, and no new privileges.
Facilitates model management with commands for pulling, listing, serving, and stopping models.
Includes features like shortname aliases for models and an optional web UI for served models.

Maintenance & Community

Community discussions via Matrix. GitHub Issues and PRs for bugs/features.
Project relies on and credits: llama.cpp, whisper.cpp, vLLM, Podman, Hugging Face.

Licensing & Compatibility

License: Not explicitly stated in the README, but project is open-source. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The project is described as being "in development" and in "alpha," indicating potential for breaking changes.
Known issue with macOS Python certificate installation may cause SSL errors.
NVIDIA GPU users need to consult ramalama-cuda(7) for proper host configuration.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

63

Issues (30d)

24

Star History

134 stars in the last 30 days

Explore Similar Projects

Starred by

Nathan Lambert

Nathan Lambert(Research Scientist at AI2) and

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

OLMoE.swift by allenai

Swift app for local, offline AI experience

Created 1 year ago

Updated 9 months ago

LLM-TPU by sophgo

Generative AI model deployment on Sophgo edge TPUs

Created 1 year ago

Updated 3 days ago

Starred by

Alberto Taiuti

Alberto Taiuti(Cofounder of Luma AI),

Julien Chaumond

Julien Chaumond(Cofounder of Hugging Face), and

3 more.

exporters by huggingface

Tool to export Hugging Face models to Core ML

Created 3 years ago

Updated 1 year ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

helix by helixml

Private GenAI stack for building AI applications

Created 2 years ago

Updated 16 hours ago

Starred by

Hiroshi Shibata

Hiroshi Shibata(Core Contributor to Ruby).

podman-desktop-extension-ai-lab by containers

Local LLM experimentation platform

Created 2 years ago

Updated 2 days ago

kaito by kaito-project

Kubernetes operator for AI/ML model inference and tuning

Created 2 years ago

Updated 2 days ago

vertex-ai-samples by GoogleCloudPlatform

Vertex AI samples: notebooks and code for ML/GenAI workflows

Created 4 years ago

Updated 3 days ago

Starred by

Alex Chen

Alex Chen(Cofounder of Nexa AI),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

1 more.

Olive by microsoft

AI model optimization toolkit for ONNX Runtime

Created 6 years ago

Updated 2 days ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and

2 more.

gpustack by gpustack

GPU cluster manager for AI model deployment

Created 1 year ago

Updated 2 days ago

cube-studio by tencentmusic

AI platform for cloud-native ML/DL, supporting the full model lifecycle

Created 4 years ago

Updated 2 months ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Tomas Valenta

Tomas Valenta(Cofounder of E2B), and

6 more.

BentoML by bentoml

Framework for serving AI apps and models

Created 6 years ago

Updated 2 days ago

PaddleX by PaddlePaddle

All-in-one toolkit for PaddlePaddle-based AI development

Created 5 years ago

Updated 2 days ago

Feedback? Help us improve.