Discover and explore top open-source AI tools and projects—updated daily.
ngxsonWebAssembly binding for on-browser LLM inference
Top 39.5% on SourcePulse
This project provides WebAssembly (WASM) bindings for llama.cpp, enabling large language model (LLM) inference directly within web browsers without requiring a backend server or GPU. It targets web developers and researchers looking to integrate LLM capabilities into client-side applications, offering a no-backend solution for LLM inference.
How It Works
wllama leverages WebAssembly and SIMD instructions to run llama.cpp models efficiently in the browser. It compiles the C++ llama.cpp library into WASM, allowing it to execute within a web worker to avoid blocking the UI thread. The library supports both high-level APIs for completions and embeddings, and low-level control over KV cache and sampling. Models can be split into smaller chunks for faster parallel downloads and to overcome the 2GB ArrayBuffer size limit.
Quick Start & Requirements
npm i @wllama/wllamaCross-Origin-Embedder-Policy and Cross-Origin-Opener-Policy headers must be configured.Highlighted Details
Maintenance & Community
Licensing & Compatibility
package.json reference. However, it bundles llama.cpp which has its own license (likely MIT or similar permissive).Limitations & Caveats
llama-gguf-split).4 weeks ago
1 day
r2d4
ScalingIntelligence
b4rtaz
EricLBuehler
cocktailpeanut