web-llm  by mlc-ai

In-browser LLM inference engine using WebGPU for hardware acceleration

created 2 years ago
16,074 stars

Top 3.0% on sourcepulse

GitHubView on GitHub
Project Summary

WebLLM provides a high-performance inference engine for large language models (LLMs) that runs entirely within the web browser, leveraging WebGPU for hardware acceleration. It targets developers building web applications who want to integrate LLM capabilities directly into their user experience, offering privacy and offline functionality without requiring server-side infrastructure.

How It Works

WebLLM utilizes WebAssembly and WebGPU to execute LLM inference client-side. It compiles models into a format compatible with the browser environment, enabling efficient execution on the user's hardware. This approach bypasses the need for backend servers, reducing latency and infrastructure costs, while the use of WebGPU ensures significant performance gains over CPU-based inference.

Quick Start & Requirements

  • Install: npm install @mlc-ai/web-llm or yarn add @mlc-ai/web-llm
  • Prerequisites: Modern web browser with WebGPU support.
  • Setup: Integration via npm/yarn or CDN. Model loading can take significant time on first run.
  • Docs: Documentation
  • Demo: WebLLM Chat

Highlighted Details

  • Full OpenAI API compatibility, including streaming and JSON mode.
  • Supports a wide range of models: Llama 3, Phi 3, Gemma, Mistral, Qwen, and more.
  • Enables custom model integration in MLC format.
  • Offers Web Worker and Service Worker support for performance optimization and persistence.
  • Chrome Extension examples are available.

Maintenance & Community

  • Active development with contributions from a distributed community.
  • Discord for community interaction.
  • Related projects: MLC LLM, WebLLM Chat.

Licensing & Compatibility

  • Apache 2.0 License. Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Model loading times can be substantial, and performance is dependent on the user's hardware and browser WebGPU implementation. Function-calling support is marked as Work In Progress (WIP).

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
7
Star History
781 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.