Chatbot web app for LLM inference using Petals client
Top 87.1% on sourcepulse
This project provides a web application and API endpoints for interacting with large language models (LLMs) via the Petals distributed inference framework. It targets developers and researchers looking to easily integrate LLM capabilities into their applications or experiment with different models without managing complex infrastructure. The primary benefit is simplified access to powerful LLMs through a user-friendly interface and efficient API.
How It Works
The backend exposes both WebSocket and HTTP APIs for LLM inference. The WebSocket API is recommended for its speed and resource efficiency, supporting streaming token generation and interactive chatbot functionalities. It uses a JSON-based protocol for communication, allowing clients to open inference sessions, send prompts, and receive generated text. The HTTP API offers a simpler POST request interface for generating text. Both APIs leverage the Petals client to connect to a distributed network of GPUs for inference.
Quick Start & Requirements
git clone https://github.com/petals-infra/chat.petals.dev.git
cd chat.petals.dev
pip install -r requirements.txt
flask run --host=0.0.0.0 --port=5000
gunicorn app:app --bind 0.0.0.0:5000 --worker-class gthread --threads 100 --timeout 1000
requirements.txt
dependencies. For Llama 2, access to Meta AI weights and huggingface-cli login
are required.Highlighted Details
config.py
.curl
snippets for API interaction.Maintenance & Community
The project is part of the Petals ecosystem, indicating potential community support and ongoing development. Specific contributor or community links are not detailed in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The public endpoint https://chat.petals.dev/api/...
is not recommended for production due to limited throughput and potential discontinuation. Falcon-180B is not supported due to license restrictions. CPU inference performance is dependent on AVX512 support.
1 year ago
Inactive