Single-file LLM distribution and runtime via `llama.cpp` and Cosmopolitan Libc
Top 1.8% on sourcepulse
This project provides a framework for distributing and running Large Language Models (LLMs) as single, self-contained executable files called "llamafiles." It aims to make open LLMs highly accessible to developers and end-users by packaging the model weights and inference engine into a portable binary.
How It Works
llamafile combines llama.cpp
with Cosmopolitan Libc
to create a single executable that runs across multiple operating systems (macOS, Windows, Linux, FreeBSD, OpenBSD, NetBSD) and CPU architectures (AMD64, ARM64). It embeds model weights within a ZIP archive, allowing them to be memory-mapped directly. The framework also supports runtime dispatching for CPU microarchitectures and dynamically compiles GPU support (Metal, CUDA, ROCm) at runtime if the necessary SDKs are present.
Quick Start & Requirements
.llamafile
executable and run it directly (e.g., ./llava-v1.5-7b-q4.llamafile
). On macOS/Linux, chmod +x
is required. On Windows, rename to .exe
.wget
or curl
, unzip
, make
, sha256sum
. For GPU support, CUDA SDK (NVIDIA) or ROCm HIP SDK (AMD) are needed.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
binfmt_misc
registrations on some Linux systems.1 month ago
1 day