LLMChat by c0sogi

Full-stack web UI for LLMs (ChatGPT, LLaMA, etc.)

Created 2 years ago

288 stars

Top 91.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Ishaan Jaffer

Cofounder of LiteLLM

Project Summary

This project provides a full-stack web UI for interacting with large language models (LLMs) like ChatGPT and local models such as LLaMA. It targets developers and power users looking for a customizable, extensible chat interface with features like web browsing, persistent memory via vector embeddings, and auto-summarization.

How It Works

The backend is built with Python's FastAPI, offering high performance and asynchronous capabilities. The frontend is developed using Flutter, providing a rich, cross-platform UI. Communication between the frontend and backend occurs in real-time via WebSockets. Key features include integration with OpenAI's API, support for local LLMs (llama.cpp, Exllama), vector storage in Redis for conversational memory, and web browsing via Duckduckgo.

Quick Start & Requirements

Installation: git clone --recurse-submodules https://github.com/c0sogi/llmchat.git followed by cd LLMChat and docker-compose -f docker-compose-local.yaml up.
Prerequisites: Docker and Docker Compose. Python 3.11 is required if running without Docker (though Docker is still needed for DBs).
Setup: Initial server startup may take a few minutes.
Access: Backend API docs at http://localhost:8000/docs, chat interface at http://localhost:8000/chat.
Local LLMs: Requires downloading GGML or GPTQ model files from Hugging Face and placing them in the llama_models directory. Exllama requires an NVIDIA CUDA GPU.

Highlighted Details

Vector Embeddings: Utilizes Redis and Langchain to store and query text embeddings, enabling LLMs to access conversational context and external data.
Auto Summarization: Condenses conversation history to save tokens, with summarization triggered by message activity and applied transparently to LLM prompts.
Local LLM Support: Integrates with llama.cpp (GGML) and Exllama (GPTQ) for running models locally, offering flexibility beyond cloud APIs.
Real-time Communication: Employs WebSockets for seamless, bidirectional chat interactions between the Flutter frontend and LLM backend.

Maintenance & Community

The project is actively maintained by c0sogi. Links to community resources like Discord or Slack are not explicitly provided in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

Local LLMs, particularly those requiring significant computation like Exllama, necessitate substantial GPU resources. The README notes that local LLMs cannot handle multiple requests concurrently due to computational expense, with a semaphore limiting requests to one.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days