chat.petals.dev  by petals-infra

Chatbot web app for LLM inference using Petals client

created 2 years ago
314 stars

Top 87.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a web application and API endpoints for interacting with large language models (LLMs) via the Petals distributed inference framework. It targets developers and researchers looking to easily integrate LLM capabilities into their applications or experiment with different models without managing complex infrastructure. The primary benefit is simplified access to powerful LLMs through a user-friendly interface and efficient API.

How It Works

The backend exposes both WebSocket and HTTP APIs for LLM inference. The WebSocket API is recommended for its speed and resource efficiency, supporting streaming token generation and interactive chatbot functionalities. It uses a JSON-based protocol for communication, allowing clients to open inference sessions, send prompts, and receive generated text. The HTTP API offers a simpler POST request interface for generating text. Both APIs leverage the Petals client to connect to a distributed network of GPUs for inference.

Quick Start & Requirements

  • Install and run locally:
    git clone https://github.com/petals-infra/chat.petals.dev.git
    cd chat.petals.dev
    pip install -r requirements.txt
    flask run --host=0.0.0.0 --port=5000
    
  • For production, use Gunicorn:
    gunicorn app:app --bind 0.0.0.0:5000 --worker-class gthread --threads 100 --timeout 1000
    
  • Prerequisites: Python, requirements.txt dependencies. For Llama 2, access to Meta AI weights and huggingface-cli login are required.
  • System Requirements: Sufficient RAM for embeddings (CPU) or GPU memory (GPU). AVX512 support on CPU can improve performance.
  • Demo: https://chat.petals.dev

Highlighted Details

  • Supports a WebSocket API for faster, more efficient, and streaming inference.
  • Offers an HTTP API for simpler, non-streaming requests.
  • Allows configuration of served models via config.py.
  • Provides example JavaScript and curl snippets for API interaction.

Maintenance & Community

The project is part of the Petals ecosystem, indicating potential community support and ongoing development. Specific contributor or community links are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The public endpoint https://chat.petals.dev/api/... is not recommended for production due to limited throughput and potential discontinuation. Falcon-180B is not supported due to license restrictions. CPU inference performance is dependent on AVX512 support.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.