RAGchain  by Marker-Inc-Korea

RAG framework extending Langchain for advanced workflows

created 2 years ago
281 stars

Top 93.7% on sourcepulse

GitHubView on GitHub
Project Summary

RAGchain is a Python framework designed to build advanced Retrieval Augmented Generation (RAG) workflows, targeting developers and researchers who find existing libraries like Langchain or LlamaIndex insufficient for complex, high-accuracy RAG implementations. It offers enhanced features such as OCR document loading, integrated reranking, and support for multiple retrievers, aiming to simplify the creation of sophisticated RAG systems.

How It Works

RAGchain separates retrieval from content storage, using a "Linker" to connect multiple retrievers and databases. This architecture facilitates the use of diverse retrieval strategies (e.g., BM25, vector DB, hybrid) and rerankers (e.g., UPR, TART, MonoT5) to improve accuracy. It also incorporates OCR loaders (Nougat, Deepdoctection) for better document ingestion and provides pre-made RAG pipelines for rapid deployment of complex workflows.

Quick Start & Requirements

  • Install via pip: pip install RAGchain
  • For development, clone the repository and run python3 setup.py develop.
  • Additional development requirements can be installed with pip install dev_requirements.txt.
  • Supports various LLM models, vector databases, and includes integrations for web search (Google, Bing).
  • Links: Docs, API Spec, QuickStart

Highlighted Details

  • Advanced RAG features: Time-Aware RAG, Importance-Aware RAG.
  • Multiple retrieval options: BM25, Vector DB, Hybrid (rrf, cc).
  • Integrated rerankers: UPR, TART, BM25, MonoT5.
  • OCR Loaders: Nougat, Deepdoctection.
  • Supports numerous datasets for evaluation, including MS-MARCO, Natural QA, and TriviaQA.
  • Includes easy benchmarking modules for workflow evaluation.

Maintenance & Community

The project is an early version and welcomes contributions via issues and pull requests. Further community engagement details are not specified in the README.

Licensing & Compatibility

Licensed under the Apache 2.0 License. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is explicitly stated to be in an early version and may be unstable. Specific limitations or unsupported features are not detailed beyond this general caveat.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.