WeKnora  by Tencent

LLM framework for deep document understanding and RAG

created 3 weeks ago

New!

1,607 stars

Top 26.1% on SourcePulse

GitHubView on GitHub
Project Summary

WeKnora is an LLM-powered framework designed for deep document understanding, semantic retrieval, and context-aware question answering, leveraging the RAG paradigm. It targets enterprise knowledge management, research, technical support, legal, and medical domains, offering efficient and controllable document Q&A pipelines.

How It Works

WeKnora employs a modular architecture for a complete document understanding and retrieval pipeline. It integrates multi-modal preprocessing, semantic vector indexing, intelligent retrieval, and LLM inference. The core retrieval process uses RAG, combining contextually relevant document snippets with LLMs for high-quality semantic answers. This approach allows for flexible configuration and extension of each component.

Quick Start & Requirements

  • Installation: Clone the repository, copy .env.example to .env and configure it, then run ./scripts/start_all.sh or make start-all.
  • Prerequisites: Docker, Docker Compose, Git.
  • Access: Web UI at http://localhost, API at http://localhost:8080.
  • Documentation: API 文档

Highlighted Details

  • Supports PDF, Word, Txt, Markdown, and images (with OCR/Caption) for document parsing.
  • Integrates with various embedding models (local, BGE/GTE API) and vector databases (PostgreSQL/pgvector, Elasticsearch).
  • Offers hybrid retrieval strategies including BM25, Dense Retrieve, and GraphRAG.
  • Allows integration with LLMs like Qwen and DeepSeek, supporting local deployments (e.g., via Ollama).
  • Provides end-to-end testing support with visualization and metrics (Recall, BLEU/ROUGE).
  • Features a Web UI and RESTful API for ease of use.

Maintenance & Community

The project welcomes community contributions via Issues and Pull Requests, with guidelines for bug fixes, new features, documentation improvements, and UI/UX optimizations. Code contributions should follow Go Code Review Comments and use gofmt. Commit messages should adhere to Conventional Commits.

Licensing & Compatibility

The project is released under the MIT license, allowing free use, modification, and distribution with attribution.

Limitations & Caveats

The README does not explicitly detail limitations or known issues. The project is developed by Tencent, suggesting potential enterprise-grade backing but also a possible focus on internal use cases.

Health Check
Last commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
29
Issues (30d)
71
Star History
1,668 stars in the last 26 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
1 more.

LightRAG by HKUDS

1.5%
20k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 1 day ago
Feedback? Help us improve.