llama-stack  by llamastack

Composable building blocks for Llama apps

Created 1 year ago
8,231 stars

Top 6.3% on SourcePulse

GitHubView on GitHub
Project Summary

Llama Stack provides composable building blocks for developing AI applications, standardizing core functionalities like Inference, RAG, Agents, and more. It targets developers seeking a unified API layer and flexible deployment options across various environments, simplifying the creation of production-grade generative AI applications.

How It Works

Llama Stack employs a plugin architecture to support diverse API implementations, enabling developers to switch between different providers and environments without altering their application code. This approach offers a consistent experience and leverages a robust ecosystem of integrated partners for tailored infrastructure and services.

Quick Start & Requirements

  • Install: pip install -U llama_stack
  • Run: INFERENCE_MODEL=meta-llama/<MODEL_ID> llama-stack build --run --template meta-reference-gpu
  • Prerequisites: Requires Meta URL for model download. Running Llama 4 models necessitates an 8xH100 GPU host.
  • Resources: Local setup via curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | sh is available.
  • Documentation: Llama Stack Quick Start, Documentation, Colab Notebook

Highlighted Details

  • Supports Llama 4 models, including Llama-4-Scout-17B-16E-Instruct.
  • Offers a unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
  • Provides multiple developer interfaces: CLI and SDKs for Python, Typescript, iOS, and Android.
  • Integrates with numerous API providers and distributions (e.g., Ollama, TGI, vLLM, Fireworks, AWS Bedrock, Together, Groq).

Maintenance & Community

  • Active development with recent support for Llama 4.
  • Community resources include documentation and example applications.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Running the latest Llama 4 models requires significant hardware resources (8xH100 GPUs). The license and commercial use terms are not clearly defined in the provided README.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
105
Issues (30d)
101
Star History
54 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
1 more.

any-llm by mozilla-ai

1.2%
2k
Unified interface for LLM providers
Created 6 months ago
Updated 1 day ago
Feedback? Help us improve.