In-browser LLM inference engine using WebGPU for hardware acceleration
Top 3.0% on sourcepulse
WebLLM provides a high-performance inference engine for large language models (LLMs) that runs entirely within the web browser, leveraging WebGPU for hardware acceleration. It targets developers building web applications who want to integrate LLM capabilities directly into their user experience, offering privacy and offline functionality without requiring server-side infrastructure.
How It Works
WebLLM utilizes WebAssembly and WebGPU to execute LLM inference client-side. It compiles models into a format compatible with the browser environment, enabling efficient execution on the user's hardware. This approach bypasses the need for backend servers, reducing latency and infrastructure costs, while the use of WebGPU ensures significant performance gains over CPU-based inference.
Quick Start & Requirements
npm install @mlc-ai/web-llm
or yarn add @mlc-ai/web-llm
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Model loading times can be substantial, and performance is dependent on the user's hardware and browser WebGPU implementation. Function-calling support is marked as Work In Progress (WIP).
2 months ago
1 week