web-llm by mlc-ai

In-browser LLM inference engine using WebGPU for hardware acceleration

Created 2 years ago

17,099 stars

Top 2.8% on SourcePulse

View on GitHub

25 Experts Love This Project

DevRel at Google DeepMind

and 21 more!

Project Summary

WebLLM provides a high-performance inference engine for large language models (LLMs) that runs entirely within the web browser, leveraging WebGPU for hardware acceleration. It targets developers building web applications who want to integrate LLM capabilities directly into their user experience, offering privacy and offline functionality without requiring server-side infrastructure.

How It Works

WebLLM utilizes WebAssembly and WebGPU to execute LLM inference client-side. It compiles models into a format compatible with the browser environment, enabling efficient execution on the user's hardware. This approach bypasses the need for backend servers, reducing latency and infrastructure costs, while the use of WebGPU ensures significant performance gains over CPU-based inference.

Quick Start & Requirements

Install: npm install @mlc-ai/web-llm or yarn add @mlc-ai/web-llm
Prerequisites: Modern web browser with WebGPU support.
Setup: Integration via npm/yarn or CDN. Model loading can take significant time on first run.
Docs: Documentation
Demo: WebLLM Chat

Highlighted Details

Full OpenAI API compatibility, including streaming and JSON mode.
Supports a wide range of models: Llama 3, Phi 3, Gemma, Mistral, Qwen, and more.
Enables custom model integration in MLC format.
Offers Web Worker and Service Worker support for performance optimization and persistence.
Chrome Extension examples are available.

Maintenance & Community

Active development with contributions from a distributed community.
Discord for community interaction.
Related projects: MLC LLM, WebLLM Chat.

Licensing & Compatibility

Apache 2.0 License. Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Model loading times can be substantial, and performance is dependent on the user's hardware and browser WebGPU implementation. Function-calling support is marked as Work In Progress (WIP).

Health Check

Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

170 stars in the last 30 days