kotaemon  by Cinnamon

Open-source RAG UI for chatting with documents, targeting both end-users and developers

created 1 year ago
22,876 stars

Top 1.9% on sourcepulse

GitHubView on GitHub
Project Summary

Kotaemon provides an open-source, customizable RAG UI for document-based question answering, targeting both end-users seeking a chat interface for their documents and developers building RAG pipelines. It offers a clean UI, supports various LLMs (API-based and local via Ollama/llama-cpp-python), and facilitates RAG pipeline development with features like hybrid retrieval, multi-modal QA, and advanced citations.

How It Works

Kotaemon implements a RAG pipeline with a hybrid retriever combining full-text and vector search, augmented by re-ranking for improved retrieval quality. It supports complex reasoning methods like question decomposition and agent-based reasoning (ReAct, ReWOO). The architecture is built on Gradio, allowing for a customizable UI and extensible RAG pipeline strategies, including GraphRAG indexing.

Quick Start & Requirements

  • Install: Via Docker (recommended) or from source (pip install -e "libs/kotaemon[all]", pip install -e "libs/ktem").
  • Prerequisites: Python >= 3.10. Docker is optional but recommended. unstructured library is needed for processing file types beyond .pdf, .html, .mhtml, and .xlsx.
  • Resources: Live demos available at Hugging Face Spaces. User and Developer guides are linked.
  • Docker Images: ghcr.io/cinnamon/kotaemon:main-full, ghcr.io/cinnamon/kotaemon:main-ollama, ghcr.io/cinnamon/kotaemon:main-lite. Supports linux/amd64 and linux/arm64.

Highlighted Details

  • Supports multi-user login, private/public collections, and shared chats.
  • Offers advanced citations with in-browser PDF preview and highlights.
  • Integrates with external RAG frameworks like NanoGraphRAG and LightRAG.
  • Features configurable settings UI for retrieval and generation parameters.

Maintenance & Community

The project is actively developed by "The Kotaemon Team." Feedback and contributions are welcomed via GitHub issues and a contributing guide.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Installation of optional dependencies like unstructured or specific RAG integrations (e.g., nano-graphrag, LightRAG) may introduce Python package version conflicts, requiring manual resolution. Official MS GraphRAG indexing is limited to OpenAI or Ollama APIs.

Health Check
Last commit

4 weeks ago

Responsiveness

1 day

Pull Requests (30d)
6
Issues (30d)
4
Star History
815 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 15 hours ago
Feedback? Help us improve.