llama-stack by llamastack

Composable building blocks for Llama apps

Created 1 year ago

8,231 stars

Top 6.3% on SourcePulse

View on GitHub

12 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Ishaan Jaffer

Cofounder of LiteLLM

Gabriel Almeida

Cofounder of Langflow

Chaoyu Yang

Founder of Bento

and 8 more!

Project Summary

Llama Stack provides composable building blocks for developing AI applications, standardizing core functionalities like Inference, RAG, Agents, and more. It targets developers seeking a unified API layer and flexible deployment options across various environments, simplifying the creation of production-grade generative AI applications.

How It Works

Llama Stack employs a plugin architecture to support diverse API implementations, enabling developers to switch between different providers and environments without altering their application code. This approach offers a consistent experience and leverages a robust ecosystem of integrated partners for tailored infrastructure and services.

Quick Start & Requirements

Install: pip install -U llama_stack
Run: INFERENCE_MODEL=meta-llama/<MODEL_ID> llama-stack build --run --template meta-reference-gpu
Prerequisites: Requires Meta URL for model download. Running Llama 4 models necessitates an 8xH100 GPU host.
Resources: Local setup via curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | sh is available.
Documentation: Llama Stack Quick Start, Documentation, Colab Notebook

Highlighted Details

Supports Llama 4 models, including Llama-4-Scout-17B-16E-Instruct.
Offers a unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
Provides multiple developer interfaces: CLI and SDKs for Python, Typescript, iOS, and Android.
Integrates with numerous API providers and distributions (e.g., Ollama, TGI, vLLM, Fireworks, AWS Bedrock, Together, Groq).

Maintenance & Community

Active development with recent support for Llama 4.
Community resources include documentation and example applications.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Running the latest Llama 4 models requires significant hardware resources (8xH100 GPUs). The license and commercial use terms are not clearly defined in the provided README.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

105

Issues (30d)

101

Star History

54 stars in the last 30 days