LLMChat  by c0sogi

Full-stack web UI for LLMs (ChatGPT, LLaMA, etc.)

created 2 years ago
283 stars

Top 93.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a full-stack web UI for interacting with large language models (LLMs) like ChatGPT and local models such as LLaMA. It targets developers and power users looking for a customizable, extensible chat interface with features like web browsing, persistent memory via vector embeddings, and auto-summarization.

How It Works

The backend is built with Python's FastAPI, offering high performance and asynchronous capabilities. The frontend is developed using Flutter, providing a rich, cross-platform UI. Communication between the frontend and backend occurs in real-time via WebSockets. Key features include integration with OpenAI's API, support for local LLMs (llama.cpp, Exllama), vector storage in Redis for conversational memory, and web browsing via Duckduckgo.

Quick Start & Requirements

  • Installation: git clone --recurse-submodules https://github.com/c0sogi/llmchat.git followed by cd LLMChat and docker-compose -f docker-compose-local.yaml up.
  • Prerequisites: Docker and Docker Compose. Python 3.11 is required if running without Docker (though Docker is still needed for DBs).
  • Setup: Initial server startup may take a few minutes.
  • Access: Backend API docs at http://localhost:8000/docs, chat interface at http://localhost:8000/chat.
  • Local LLMs: Requires downloading GGML or GPTQ model files from Hugging Face and placing them in the llama_models directory. Exllama requires an NVIDIA CUDA GPU.

Highlighted Details

  • Vector Embeddings: Utilizes Redis and Langchain to store and query text embeddings, enabling LLMs to access conversational context and external data.
  • Auto Summarization: Condenses conversation history to save tokens, with summarization triggered by message activity and applied transparently to LLM prompts.
  • Local LLM Support: Integrates with llama.cpp (GGML) and Exllama (GPTQ) for running models locally, offering flexibility beyond cloud APIs.
  • Real-time Communication: Employs WebSockets for seamless, bidirectional chat interactions between the Flutter frontend and LLM backend.

Maintenance & Community

The project is actively maintained by c0sogi. Links to community resources like Discord or Slack are not explicitly provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

Local LLMs, particularly those requiring significant computation like Exllama, necessitate substantial GPU resources. The README notes that local LLMs cannot handle multiple requests concurrently due to computational expense, with a semaphore limiting requests to one.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.