QAnything by netease-youdao

Anything Q&A system for local knowledge bases, supporting diverse file formats

Created 2 years ago

13,857 stars

Top 3.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

QAnything is an open-source, local knowledge base question-answering system designed for users who need to query information from various document types securely and offline. It supports a wide array of file formats including PDF, DOCX, PPTX, XLSX, images, and web links, offering cross-language Q&A capabilities and efficient retrieval even with massive datasets.

How It Works

QAnything employs a two-stage retrieval process to overcome the performance degradation common in large-scale RAG systems. The first stage uses embedding models (specifically BCEmbedding, noted for bilingual and cross-lingual proficiency) for initial candidate retrieval. The second stage applies a reranking model to refine the results, significantly improving accuracy. This approach, validated by MTEB and LlamaIndex benchmarks, ensures stable accuracy gains as data volume increases. The system is built with independent, replaceable components for parsing, OCR, embedding, and reranking, and defaults to CPU execution for broad hardware compatibility.

Quick Start & Requirements

Installation: Uses Docker Compose for one-click startup. Specific commands vary by OS (docker compose -f docker-compose-linux.yaml up, docker compose -f docker-compose-mac.yaml up, docker compose -f docker-compose-win.yaml up).
Prerequisites: Docker version >= 20.10.5, Docker Compose version >= 2.23.3. Requires >= 20GB RAM.
Access: Frontend at http://localhost:8777/qanything/.
Docs: QAnything API documentation

Highlighted Details

Version 2.0.0 offers significant improvements in usability, resource consumption, parsing, and retrieval, merging old Docker and Python versions into a unified Docker Compose setup.
Enhanced parsing capabilities for complex tables, multi-column text, and cross-page layouts, with improved handling of images and URLs.
Defaults to CPU-only operation, making it hardware-friendly and eliminating GPU dependencies for core functionality.
Supports offline use by deploying local large models (e.g., via Ollama) and provides detailed logs for debugging.

Maintenance & Community

Actively maintained with version 2.0.0 released on August 23, 2024.
Community support via Discord and WeChat. Feedback can be provided via GitHub issues/discussions or email (qanything@rd.netease.com).
Roadmap available at QAnything Roadmap.

Licensing & Compatibility

Licensed under AGPL-3.0.
AGPL-3.0 is a strong copyleft license, requiring derivative works to also be open-sourced under the same license. Commercial use or linking with closed-source applications may require careful consideration of license obligations.

Limitations & Caveats

The AGPL-3.0 license may impose significant restrictions on commercial or closed-source usage.
Audio file support was temporarily removed due to speed and resource consumption concerns.
While optimized for CPU, performance with very large datasets or complex queries may still benefit from hardware acceleration if integrated.

Health Check

Last Commit

11 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

53 stars in the last 30 days