WebGPU inference of GPT models in the browser
Top 13.3% on sourcepulse
WebGPT enables running transformer-based language models directly in the browser using WebGPU, offering a portable and accessible platform for AI experimentation. It targets developers and researchers interested in on-device AI inference, providing a vanilla JavaScript implementation for educational purposes and proof-of-concept applications.
How It Works
WebGPT leverages WebGPU's compute shader capabilities to perform GPT inference directly within the browser. It implements transformer models in pure JavaScript, aiming for efficiency and broad compatibility. Key optimizations include GPU-based embeddings, kernel fusion, and buffer reuse, allowing it to handle models up to 500M parameters with reasonable performance.
Quick Start & Requirements
main.js
for model loading and execution details. Model conversion scripts are available in misc/conversion_scripts
.Highlighted Details
Maintenance & Community
The project appears to be a personal project with significant contributions from a single developer. There are no explicit mentions of community channels, roadmaps, or ongoing maintenance efforts beyond the listed "Roadmap / Fixing Stupid Decisions."
Licensing & Compatibility
The README does not explicitly state a license. The project's reliance on vanilla JavaScript and HTML suggests broad compatibility with modern web browsers.
Limitations & Caveats
The project is presented as a proof-of-concept and educational resource, with some roadmap items indicating ongoing development and optimization. Larger models (e.g., 1.5B parameters) are noted as unstable. Certain operations like selection ops (topk, softmax) are not yet GPU-accelerated.
1 year ago
1 day