chat.petals.dev by petals-infra

Chatbot web app for LLM inference using Petals client

Created 3 years ago

316 stars

Top 85.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

This project provides a web application and API endpoints for interacting with large language models (LLMs) via the Petals distributed inference framework. It targets developers and researchers looking to easily integrate LLM capabilities into their applications or experiment with different models without managing complex infrastructure. The primary benefit is simplified access to powerful LLMs through a user-friendly interface and efficient API.

How It Works

The backend exposes both WebSocket and HTTP APIs for LLM inference. The WebSocket API is recommended for its speed and resource efficiency, supporting streaming token generation and interactive chatbot functionalities. It uses a JSON-based protocol for communication, allowing clients to open inference sessions, send prompts, and receive generated text. The HTTP API offers a simpler POST request interface for generating text. Both APIs leverage the Petals client to connect to a distributed network of GPUs for inference.

Quick Start & Requirements

Install and run locally:

git clone https://github.com/petals-infra/chat.petals.dev.git
cd chat.petals.dev
pip install -r requirements.txt
flask run --host=0.0.0.0 --port=5000

For production, use Gunicorn:

gunicorn app:app --bind 0.0.0.0:5000 --worker-class gthread --threads 100 --timeout 1000

Prerequisites: Python, requirements.txt dependencies. For Llama 2, access to Meta AI weights and huggingface-cli login are required.
System Requirements: Sufficient RAM for embeddings (CPU) or GPU memory (GPU). AVX512 support on CPU can improve performance.
Demo: https://chat.petals.dev

Highlighted Details

Supports a WebSocket API for faster, more efficient, and streaming inference.
Offers an HTTP API for simpler, non-streaming requests.
Allows configuration of served models via config.py.
Provides example JavaScript and curl snippets for API interaction.

Maintenance & Community

The project is part of the Petals ecosystem, indicating potential community support and ongoing development. Specific contributor or community links are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The public endpoint https://chat.petals.dev/api/... is not recommended for production due to limited throughput and potential discontinuation. Falcon-180B is not supported due to license restrictions. CPU inference performance is dependent on AVX512 support.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days