llamafile by mozilla-ai

Single-file LLM distribution and runtime via `llama.cpp` and Cosmopolitan Libc

Created 2 years ago

23,603 stars

Top 1.7% on SourcePulse

View on GitHub

25 Experts Love This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

Cofounder of Mintlify

and 21 more!

Project Summary

This project provides a framework for distributing and running Large Language Models (LLMs) as single, self-contained executable files called "llamafiles." It aims to make open LLMs highly accessible to developers and end-users by packaging the model weights and inference engine into a portable binary.

How It Works

llamafile combines llama.cpp with Cosmopolitan Libc to create a single executable that runs across multiple operating systems (macOS, Windows, Linux, FreeBSD, OpenBSD, NetBSD) and CPU architectures (AMD64, ARM64). It embeds model weights within a ZIP archive, allowing them to be memory-mapped directly. The framework also supports runtime dispatching for CPU microarchitectures and dynamically compiles GPU support (Metal, CUDA, ROCm) at runtime if the necessary SDKs are present.

Quick Start & Requirements

Install/Run: Download a .llamafile executable and run it directly (e.g., ./llava-v1.5-7b-q4.llamafile). On macOS/Linux, chmod +x is required. On Windows, rename to .exe.
Prerequisites:
- macOS: Xcode Command Line Tools.
- Linux/WSL: wget or curl, unzip, make, sha256sum. For GPU support, CUDA SDK (NVIDIA) or ROCm HIP SDK (AMD) are needed.
- CPU: AMD64 requires AVX; ARM64 requires ARMv8a+.
Setup Time: Minimal, as it's a single executable.
Links: Announcement Blog Post, llamafile Server README

Highlighted Details

Single-file executables for LLMs, requiring no installation.
Supports multiple OSes and CPU architectures via Cosmopolitan Libc.
Dynamic GPU compilation (Metal, CUDA, ROCm) at runtime.
OpenAI API compatible endpoints for chat completions and embeddings.
Supports embedding models or using external GGUF weights.
Includes security sandboxing (pledge/seccomp) on Linux/OpenBSD.

Maintenance & Community

A Mozilla Builders project.
Active development and community engagement are implied by the project's nature.

Licensing & Compatibility

Project License: Apache 2.0.
llama.cpp changes: MIT.
Compatible with commercial use and closed-source linking, provided MIT/Apache 2.0 terms are met.

Limitations & Caveats

Windows executables have a 4GB file size limit, requiring external weights for larger models.
GPU support compilation can fail if SDKs are not correctly installed or if the build environment is not compatible (e.g., MSVC for NVIDIA DLLs).
Sandboxing is limited to Linux/OpenBSD without GPUs.
Potential conflicts with system binfmt_misc registrations on some Linux systems.
CrowdStrike antivirus may interfere with execution.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

169 stars in the last 30 days