Browser-based demo of GPT-2 inference
Top 84.2% on sourcepulse
This project provides a browser-based, WebGL2 implementation of the GPT-2 small (117M) model, enabling inference directly within the user's web browser. It targets developers and researchers interested in on-device AI inference and interactive visualization of transformer models. The key benefit is running a significant portion of the GPT-2 forward pass on the GPU via WebGL2 shaders, with BPE tokenization handled client-side using js-tiktoken
.
How It Works
The core of the implementation leverages WebGL2 shaders to execute the GPT-2 forward pass, including the transformer blocks and attention mechanisms, directly on the GPU. This approach offloads computation from the CPU, potentially offering faster inference and enabling visualizations of internal model states like attention matrices. Tokenization is handled client-side using js-tiktoken
, avoiding the need for WASM or server-side processing.
Quick Start & Requirements
pip install torch numpy transformers
python download_weights.py
npm install
npm run dev
Highlighted Details
js-tiktoken
in the browser.Maintenance & Community
No specific information on maintainers, community channels, or roadmap is provided in the README.
Licensing & Compatibility
Limitations & Caveats
The implementation is specifically for GPT-2 small (117M) and may not be directly applicable to larger models without significant modifications. The README does not detail performance benchmarks or specific browser compatibility nuances beyond requiring WebGL2 support.
1 month ago
Inactive