Knowledge-Base-Self-Hosting-Kit by 2dogsandanerd

RAG system for private code and document querying

Created 7 months ago

265 stars

Top 96.2% on SourcePulse

Project Summary

This project provides a production-ready, self-hosted Retrieval-Augmented Generation (RAG) system designed for ingesting and querying codebases and documentation with a focus on privacy and zero configuration. It targets developers and power users needing a private, local LLM-integrated knowledge base solution, offering a robust alternative to cloud-based services.

How It Works

The system leverages a Docker-powered architecture, combining ChromaDB for vector storage and Docling for document ingestion. It employs a hybrid chunking strategy (vector + BM25) to process diverse data types, including PDFs and code repositories, enabling it to differentiate between code and prose. A FastAPI backend exposes CRUD, ingestion, and search APIs, while a modern UI provides dashboards, ingestion tools, and agent configuration capabilities.

Quick Start & Requirements

Install/Run: Clone the repository, copy .env.example to .env and configure settings (e.g., DOCS_DIR, LLM provider), then run docker compose up -d.
Prerequisites: Docker is essential for running the orchestrated services (ChromaDB, Nginx, FastAPI backend, etc.).
Resource Footprint: Requires sufficient resources for Docker containers and local LLM execution.
Links:
- Repository: https://github.com/2dogsandanerd/Knowledge-Base-Self-Hosting-Kit
- UI: http://localhost:8080/
- OpenAPI Docs: http://localhost:8080/docs
- Health Check: http://localhost:8080/health

Highlighted Details

Hybrid Ingestion & Search: Supports single uploads, folder scans, and hybrid chunking (vector + BM25) for comprehensive data processing.
Agent-Ready: Features agent-ready semantic search with k-tuning, citation viewing, and streaming logs, integrating with MCP (Multi-Agent Communication Protocol) clients like OpenClaw.
Agent Configuration UI: A dedicated UI tab allows configuring MCP connection details, API URLs, timeouts, and log levels without modifying .env files or using the CLI, including connectivity testing.
Local LLM Support: Defaults to Ollama but supports OpenAI-compatible servers via OPENAI_BASE_URL, offering flexibility for local model deployment.

Maintenance & Community

The author has expressed significant distress regarding alleged plagiarism and has stated they are "out of this game" and will no longer contribute to open source. This indicates a high risk of future maintenance cessation. No community links (Discord, Slack) are provided.

Licensing & Compatibility

The license type is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The project's future maintenance is highly uncertain due to the author's stated withdrawal from open-source contributions following a dispute. The lack of explicit licensing information poses a significant adoption blocker for commercial or sensitive use cases.

Knowledge-Base-Self-Hosting-Kit by 2dogsandanerd

Explore Similar Projects

llama-github by JetXu-LLM

RagLangChainTest by NanGePlus

llm-wiki by Pratiyush

rag-all-in-one by lehoanglong95

mcp-local-rag by shinpr

llm-mcp-rag by KelvinQiu802

acemcp by qy527145

sage by Storia-AI

canopy by pinecone-io

mgrep by mixedbread-ai

semble by MinishLab

private-gpt by zylon-ai