llama-stack  by llamastack

Composable building blocks for Llama apps

Created 1 year ago
8,074 stars

Top 6.4% on SourcePulse

GitHubView on GitHub
Project Summary

Llama Stack provides composable building blocks for developing AI applications, standardizing core functionalities like Inference, RAG, Agents, and more. It targets developers seeking a unified API layer and flexible deployment options across various environments, simplifying the creation of production-grade generative AI applications.

How It Works

Llama Stack employs a plugin architecture to support diverse API implementations, enabling developers to switch between different providers and environments without altering their application code. This approach offers a consistent experience and leverages a robust ecosystem of integrated partners for tailored infrastructure and services.

Quick Start & Requirements

  • Install: pip install -U llama_stack
  • Run: INFERENCE_MODEL=meta-llama/<MODEL_ID> llama-stack build --run --template meta-reference-gpu
  • Prerequisites: Requires Meta URL for model download. Running Llama 4 models necessitates an 8xH100 GPU host.
  • Resources: Local setup via curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | sh is available.
  • Documentation: Llama Stack Quick Start, Documentation, Colab Notebook

Highlighted Details

  • Supports Llama 4 models, including Llama-4-Scout-17B-16E-Instruct.
  • Offers a unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
  • Provides multiple developer interfaces: CLI and SDKs for Python, Typescript, iOS, and Android.
  • Integrates with numerous API providers and distributions (e.g., Ollama, TGI, vLLM, Fireworks, AWS Bedrock, Together, Groq).

Maintenance & Community

  • Active development with recent support for Llama 4.
  • Community resources include documentation and example applications.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Running the latest Llama 4 models requires significant hardware resources (8xH100 GPUs). The license and commercial use terms are not clearly defined in the provided README.

Health Check
Last Commit

13 hours ago

Responsiveness

1 day

Pull Requests (30d)
258
Issues (30d)
105
Star History
109 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.