llama-stack  by meta-llama

Composable building blocks for Llama apps

created 1 year ago
7,937 stars

Top 6.6% on sourcepulse

GitHubView on GitHub
Project Summary

Llama Stack provides composable building blocks for developing AI applications, standardizing core functionalities like Inference, RAG, Agents, and more. It targets developers seeking a unified API layer and flexible deployment options across various environments, simplifying the creation of production-grade generative AI applications.

How It Works

Llama Stack employs a plugin architecture to support diverse API implementations, enabling developers to switch between different providers and environments without altering their application code. This approach offers a consistent experience and leverages a robust ecosystem of integrated partners for tailored infrastructure and services.

Quick Start & Requirements

  • Install: pip install -U llama_stack
  • Run: INFERENCE_MODEL=meta-llama/<MODEL_ID> llama-stack build --run --template meta-reference-gpu
  • Prerequisites: Requires Meta URL for model download. Running Llama 4 models necessitates an 8xH100 GPU host.
  • Resources: Local setup via curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | sh is available.
  • Documentation: Llama Stack Quick Start, Documentation, Colab Notebook

Highlighted Details

  • Supports Llama 4 models, including Llama-4-Scout-17B-16E-Instruct.
  • Offers a unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
  • Provides multiple developer interfaces: CLI and SDKs for Python, Typescript, iOS, and Android.
  • Integrates with numerous API providers and distributions (e.g., Ollama, TGI, vLLM, Fireworks, AWS Bedrock, Together, Groq).

Maintenance & Community

  • Active development with recent support for Llama 4.
  • Community resources include documentation and example applications.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Running the latest Llama 4 models requires significant hardware resources (8xH100 GPUs). The license and commercial use terms are not clearly defined in the provided README.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
324
Issues (30d)
109
Star History
245 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jerry Liu Jerry Liu(Cofounder of LlamaIndex), and
4 more.

llama-hub by run-llama

0.0%
3k
Data loaders for LLMs (deprecated, now in LlamaIndex core)
created 2 years ago
updated 1 year ago
Feedback? Help us improve.