WebGPT  by 0hq

WebGPU inference of GPT models in the browser

Created 2 years ago
3,730 stars

Top 13.0% on SourcePulse

GitHubView on GitHub
Project Summary

WebGPT enables running transformer-based language models directly in the browser using WebGPU, offering a portable and accessible platform for AI experimentation. It targets developers and researchers interested in on-device AI inference, providing a vanilla JavaScript implementation for educational purposes and proof-of-concept applications.

How It Works

WebGPT leverages WebGPU's compute shader capabilities to perform GPT inference directly within the browser. It implements transformer models in pure JavaScript, aiming for efficiency and broad compatibility. Key optimizations include GPU-based embeddings, kernel fusion, and buffer reuse, allowing it to handle models up to 500M parameters with reasonable performance.

Quick Start & Requirements

  • Install/Run: Clone the repository and open the HTML files in a WebGPU-compatible browser (Chrome Canary or Edge Canary recommended).
  • Prerequisites: WebGPU-compatible browser (e.g., Chrome Canary v113+). Git LFS is required to download model files.
  • Demo: KMeans.org
  • Docs: See main.js for model loading and execution details. Model conversion scripts are available in misc/conversion_scripts.

Highlighted Details

  • Achieves 3ms/token with 5M parameters (f32) on an M1 Mac.
  • Supports models up to 500M parameters, with experimental 1.5B parameter support.
  • Implemented in ~1500 lines of vanilla JavaScript.
  • Includes GPT-Shakespeare and GPT-2 117M models.

Maintenance & Community

The project appears to be a personal project with significant contributions from a single developer. There are no explicit mentions of community channels, roadmaps, or ongoing maintenance efforts beyond the listed "Roadmap / Fixing Stupid Decisions."

Licensing & Compatibility

The README does not explicitly state a license. The project's reliance on vanilla JavaScript and HTML suggests broad compatibility with modern web browsers.

Limitations & Caveats

The project is presented as a proof-of-concept and educational resource, with some roadmap items indicating ongoing development and optimization. Larger models (e.g., 1.5B parameters) are noted as unstable. Certain operations like selection ops (topk, softmax) are not yet GPU-accelerated.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
11 more.

ctransformers by marella

0.1%
2k
Python bindings for fast Transformer model inference
Created 2 years ago
Updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
3 more.

gpu.cpp by AnswerDotAI

0%
4k
C++ library for portable GPU computation using WebGPU
Created 1 year ago
Updated 2 months ago
Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Cody Yu Cody Yu(Coauthor of vLLM; MTS at OpenAI).

xDiT by xdit-project

0.7%
2k
Inference engine for parallel Diffusion Transformer (DiT) deployment
Created 1 year ago
Updated 1 day ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
11 more.

petals by bigscience-workshop

0.1%
10k
Run LLMs at home, BitTorrent-style
Created 3 years ago
Updated 1 year ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
4 more.

ktransformers by kvcache-ai

0.3%
15k
Framework for LLM inference optimization experimentation
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.