WebAssembly binding for on-browser LLM inference
Top 45.5% on sourcepulse
This project provides WebAssembly (WASM) bindings for llama.cpp, enabling large language model (LLM) inference directly within web browsers without requiring a backend server or GPU. It targets web developers and researchers looking to integrate LLM capabilities into client-side applications, offering a no-backend solution for LLM inference.
How It Works
wllama leverages WebAssembly and SIMD instructions to run llama.cpp models efficiently in the browser. It compiles the C++ llama.cpp library into WASM, allowing it to execute within a web worker to avoid blocking the UI thread. The library supports both high-level APIs for completions and embeddings, and low-level control over KV cache and sampling. Models can be split into smaller chunks for faster parallel downloads and to overcome the 2GB ArrayBuffer
size limit.
Quick Start & Requirements
npm i @wllama/wllama
Cross-Origin-Embedder-Policy
and Cross-Origin-Opener-Policy
headers must be configured.Highlighted Details
Maintenance & Community
Licensing & Compatibility
package.json
reference. However, it bundles llama.cpp
which has its own license (likely MIT or similar permissive).Limitations & Caveats
llama-gguf-split
).1 week ago
1 day