WeKnora  by Tencent

LLM framework for deep document understanding and RAG

Created 4 months ago
7,682 stars

Top 6.7% on SourcePulse

GitHubView on GitHub
Project Summary

WeKnora is an LLM-powered framework designed for deep document understanding, semantic retrieval, and context-aware question answering, leveraging the RAG paradigm. It targets enterprise knowledge management, research, technical support, legal, and medical domains, offering efficient and controllable document Q&A pipelines.

How It Works

WeKnora employs a modular architecture for a complete document understanding and retrieval pipeline. It integrates multi-modal preprocessing, semantic vector indexing, intelligent retrieval, and LLM inference. The core retrieval process uses RAG, combining contextually relevant document snippets with LLMs for high-quality semantic answers. This approach allows for flexible configuration and extension of each component.

Quick Start & Requirements

  • Installation: Clone the repository, copy .env.example to .env and configure it, then run ./scripts/start_all.sh or make start-all.
  • Prerequisites: Docker, Docker Compose, Git.
  • Access: Web UI at http://localhost, API at http://localhost:8080.
  • Documentation: API 文档

Highlighted Details

  • Supports PDF, Word, Txt, Markdown, and images (with OCR/Caption) for document parsing.
  • Integrates with various embedding models (local, BGE/GTE API) and vector databases (PostgreSQL/pgvector, Elasticsearch).
  • Offers hybrid retrieval strategies including BM25, Dense Retrieve, and GraphRAG.
  • Allows integration with LLMs like Qwen and DeepSeek, supporting local deployments (e.g., via Ollama).
  • Provides end-to-end testing support with visualization and metrics (Recall, BLEU/ROUGE).
  • Features a Web UI and RESTful API for ease of use.

Maintenance & Community

The project welcomes community contributions via Issues and Pull Requests, with guidelines for bug fixes, new features, documentation improvements, and UI/UX optimizations. Code contributions should follow Go Code Review Comments and use gofmt. Commit messages should adhere to Conventional Commits.

Licensing & Compatibility

The project is released under the MIT license, allowing free use, modification, and distribution with attribution.

Limitations & Caveats

The README does not explicitly detail limitations or known issues. The project is developed by Tencent, suggesting potential enterprise-grade backing but also a possible focus on internal use cases.

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
22
Issues (30d)
21
Star History
645 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.