Browser-based chat app for local LLM inference
Top 53.3% on sourcepulse
This project provides a web-based interface for interacting with large language models (LLMs) like Vicuna directly within the browser, leveraging WebGPU for acceleration. It targets users seeking a private, serverless, and easily deployable solution for LLM chat, offering a seamless user experience with features like local data storage and PWA support.
How It Works
The application runs entirely client-side, utilizing WebGPU for hardware-accelerated LLM inference. The LLM model operates within a web worker, preventing UI blocking and ensuring a smooth user experience. Model weights are cached locally after an initial download, enabling faster subsequent loads.
Quick Start & Requirements
npm install && npm run dev
Highlighted Details
Maintenance & Community
The project is actively developed by Ryan-yang125. Further community or roadmap details are not explicitly provided in the README.
Licensing & Compatibility
Limitations & Caveats
Requires a modern browser with WebGPU support and a GPU with sufficient VRAM for optimal performance; performance may degrade on lower-spec hardware. The README indicates support for Vicuna-7b and RedPajama-INCITE-Chat-3B, but only Vicuna-7b is marked as fully supported.
1 year ago
1 week