QAnything  by netease-youdao

Anything Q&A system for local knowledge bases, supporting diverse file formats

created 1 year ago
13,424 stars

Top 3.8% on sourcepulse

GitHubView on GitHub
Project Summary

QAnything is an open-source, local knowledge base question-answering system designed for users who need to query information from various document types securely and offline. It supports a wide array of file formats including PDF, DOCX, PPTX, XLSX, images, and web links, offering cross-language Q&A capabilities and efficient retrieval even with massive datasets.

How It Works

QAnything employs a two-stage retrieval process to overcome the performance degradation common in large-scale RAG systems. The first stage uses embedding models (specifically BCEmbedding, noted for bilingual and cross-lingual proficiency) for initial candidate retrieval. The second stage applies a reranking model to refine the results, significantly improving accuracy. This approach, validated by MTEB and LlamaIndex benchmarks, ensures stable accuracy gains as data volume increases. The system is built with independent, replaceable components for parsing, OCR, embedding, and reranking, and defaults to CPU execution for broad hardware compatibility.

Quick Start & Requirements

  • Installation: Uses Docker Compose for one-click startup. Specific commands vary by OS (docker compose -f docker-compose-linux.yaml up, docker compose -f docker-compose-mac.yaml up, docker compose -f docker-compose-win.yaml up).
  • Prerequisites: Docker version >= 20.10.5, Docker Compose version >= 2.23.3. Requires >= 20GB RAM.
  • Access: Frontend at http://localhost:8777/qanything/.
  • Docs: QAnything API documentation

Highlighted Details

  • Version 2.0.0 offers significant improvements in usability, resource consumption, parsing, and retrieval, merging old Docker and Python versions into a unified Docker Compose setup.
  • Enhanced parsing capabilities for complex tables, multi-column text, and cross-page layouts, with improved handling of images and URLs.
  • Defaults to CPU-only operation, making it hardware-friendly and eliminating GPU dependencies for core functionality.
  • Supports offline use by deploying local large models (e.g., via Ollama) and provides detailed logs for debugging.

Maintenance & Community

  • Actively maintained with version 2.0.0 released on August 23, 2024.
  • Community support via Discord and WeChat. Feedback can be provided via GitHub issues/discussions or email (qanything@rd.netease.com).
  • Roadmap available at QAnything Roadmap.

Licensing & Compatibility

  • Licensed under AGPL-3.0.
  • AGPL-3.0 is a strong copyleft license, requiring derivative works to also be open-sourced under the same license. Commercial use or linking with closed-source applications may require careful consideration of license obligations.

Limitations & Caveats

  • The AGPL-3.0 license may impose significant restrictions on commercial or closed-source usage.
  • Audio file support was temporarily removed due to speed and resource consumption concerns.
  • While optimized for CPU, performance with very large datasets or complex queries may still benefit from hardware acceleration if integrated.
Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
383 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
1 more.

SillyTavern by SillyTavern

3.2%
17k
LLM frontend for power users
created 2 years ago
updated 3 days ago
Feedback? Help us improve.