Knowledge-Base-Self-Hosting-Kit  by 2dogsandanerd

RAG system for private code and document querying

Created 5 months ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a production-ready, self-hosted Retrieval-Augmented Generation (RAG) system designed for ingesting and querying codebases and documentation with a focus on privacy and zero configuration. It targets developers and power users needing a private, local LLM-integrated knowledge base solution, offering a robust alternative to cloud-based services.

How It Works

The system leverages a Docker-powered architecture, combining ChromaDB for vector storage and Docling for document ingestion. It employs a hybrid chunking strategy (vector + BM25) to process diverse data types, including PDFs and code repositories, enabling it to differentiate between code and prose. A FastAPI backend exposes CRUD, ingestion, and search APIs, while a modern UI provides dashboards, ingestion tools, and agent configuration capabilities.

Quick Start & Requirements

  • Install/Run: Clone the repository, copy .env.example to .env and configure settings (e.g., DOCS_DIR, LLM provider), then run docker compose up -d.
  • Prerequisites: Docker is essential for running the orchestrated services (ChromaDB, Nginx, FastAPI backend, etc.).
  • Resource Footprint: Requires sufficient resources for Docker containers and local LLM execution.
  • Links:

Highlighted Details

  • Hybrid Ingestion & Search: Supports single uploads, folder scans, and hybrid chunking (vector + BM25) for comprehensive data processing.
  • Agent-Ready: Features agent-ready semantic search with k-tuning, citation viewing, and streaming logs, integrating with MCP (Multi-Agent Communication Protocol) clients like OpenClaw.
  • Agent Configuration UI: A dedicated UI tab allows configuring MCP connection details, API URLs, timeouts, and log levels without modifying .env files or using the CLI, including connectivity testing.
  • Local LLM Support: Defaults to Ollama but supports OpenAI-compatible servers via OPENAI_BASE_URL, offering flexibility for local model deployment.

Maintenance & Community

The author has expressed significant distress regarding alleged plagiarism and has stated they are "out of this game" and will no longer contribute to open source. This indicates a high risk of future maintenance cessation. No community links (Discord, Slack) are provided.

Licensing & Compatibility

The license type is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The project's future maintenance is highly uncertain due to the author's stated withdrawal from open-source contributions following a dispute. The lack of explicit licensing information poses a significant adoption blocker for commercial or sensitive use cases.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.