llamafile  by Mozilla-Ocho

Single-file LLM distribution and runtime via `llama.cpp` and Cosmopolitan Libc

created 1 year ago
22,881 stars

Top 1.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a framework for distributing and running Large Language Models (LLMs) as single, self-contained executable files called "llamafiles." It aims to make open LLMs highly accessible to developers and end-users by packaging the model weights and inference engine into a portable binary.

How It Works

llamafile combines llama.cpp with Cosmopolitan Libc to create a single executable that runs across multiple operating systems (macOS, Windows, Linux, FreeBSD, OpenBSD, NetBSD) and CPU architectures (AMD64, ARM64). It embeds model weights within a ZIP archive, allowing them to be memory-mapped directly. The framework also supports runtime dispatching for CPU microarchitectures and dynamically compiles GPU support (Metal, CUDA, ROCm) at runtime if the necessary SDKs are present.

Quick Start & Requirements

  • Install/Run: Download a .llamafile executable and run it directly (e.g., ./llava-v1.5-7b-q4.llamafile). On macOS/Linux, chmod +x is required. On Windows, rename to .exe.
  • Prerequisites:
    • macOS: Xcode Command Line Tools.
    • Linux/WSL: wget or curl, unzip, make, sha256sum. For GPU support, CUDA SDK (NVIDIA) or ROCm HIP SDK (AMD) are needed.
    • CPU: AMD64 requires AVX; ARM64 requires ARMv8a+.
  • Setup Time: Minimal, as it's a single executable.
  • Links: Announcement Blog Post, llamafile Server README

Highlighted Details

  • Single-file executables for LLMs, requiring no installation.
  • Supports multiple OSes and CPU architectures via Cosmopolitan Libc.
  • Dynamic GPU compilation (Metal, CUDA, ROCm) at runtime.
  • OpenAI API compatible endpoints for chat completions and embeddings.
  • Supports embedding models or using external GGUF weights.
  • Includes security sandboxing (pledge/seccomp) on Linux/OpenBSD.

Maintenance & Community

  • A Mozilla Builders project.
  • Active development and community engagement are implied by the project's nature.

Licensing & Compatibility

  • Project License: Apache 2.0.
  • llama.cpp changes: MIT.
  • Compatible with commercial use and closed-source linking, provided MIT/Apache 2.0 terms are met.

Limitations & Caveats

  • Windows executables have a 4GB file size limit, requiring external weights for larger models.
  • GPU support compilation can fail if SDKs are not correctly installed or if the build environment is not compatible (e.g., MSVC for NVIDIA DLLs).
  • Sandboxing is limited to Linux/OpenBSD without GPUs.
  • Potential conflicts with system binfmt_misc registrations on some Linux systems.
  • CrowdStrike antivirus may interfere with execution.
Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
4
Star History
656 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

gpu.cpp by AnswerDotAI

0.2%
4k
C++ library for portable GPU computation using WebGPU
created 1 year ago
updated 2 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 10 hours ago
Feedback? Help us improve.