WeKnora by Tencent

LLM framework for deep document understanding and RAG

Created 4 months ago

7,682 stars

Top 6.7% on SourcePulse

Project Summary

WeKnora is an LLM-powered framework designed for deep document understanding, semantic retrieval, and context-aware question answering, leveraging the RAG paradigm. It targets enterprise knowledge management, research, technical support, legal, and medical domains, offering efficient and controllable document Q&A pipelines.

How It Works

WeKnora employs a modular architecture for a complete document understanding and retrieval pipeline. It integrates multi-modal preprocessing, semantic vector indexing, intelligent retrieval, and LLM inference. The core retrieval process uses RAG, combining contextually relevant document snippets with LLMs for high-quality semantic answers. This approach allows for flexible configuration and extension of each component.

Quick Start & Requirements

Installation: Clone the repository, copy .env.example to .env and configure it, then run ./scripts/start_all.sh or make start-all.
Prerequisites: Docker, Docker Compose, Git.
Access: Web UI at http://localhost, API at http://localhost:8080.
Documentation: API 文档

Highlighted Details

Supports PDF, Word, Txt, Markdown, and images (with OCR/Caption) for document parsing.
Integrates with various embedding models (local, BGE/GTE API) and vector databases (PostgreSQL/pgvector, Elasticsearch).
Offers hybrid retrieval strategies including BM25, Dense Retrieve, and GraphRAG.
Allows integration with LLMs like Qwen and DeepSeek, supporting local deployments (e.g., via Ollama).
Provides end-to-end testing support with visualization and metrics (Recall, BLEU/ROUGE).
Features a Web UI and RESTful API for ease of use.

Maintenance & Community

The project welcomes community contributions via Issues and Pull Requests, with guidelines for bug fixes, new features, documentation improvements, and UI/UX optimizations. Code contributions should follow Go Code Review Comments and use gofmt. Commit messages should adhere to Conventional Commits.

Licensing & Compatibility

The project is released under the MIT license, allowing free use, modification, and distribution with attribution.

Limitations & Caveats

The README does not explicitly detail limitations or known issues. The project is developed by Tencent, suggesting potential enterprise-grade backing but also a possible focus on internal use cases.

WeKnora by Tencent

Explore Similar Projects

superlinked by superlinked

ColiVara by tjmlabs

localGPT-Vision by PromtEngineer

chatWeb by SkywalkerDarren

Local_Pdf_Chat_RAG by weiwill88

raptor by parthsarthi03

nv-ingest by NVIDIA

PageIndex by VectifyAI

Chinese-LangChain by yanqiangmiffy

LangChain-ChatGLM-Webui by X-D-Lab

pdfGPT by bhaskatripathi

ragflow by infiniflow