RAGchain  by Marker-Inc-Korea

RAG framework extending Langchain for advanced workflows

Created 2 years ago
283 stars

Top 92.3% on SourcePulse

GitHubView on GitHub
Project Summary

RAGchain is a Python framework designed to build advanced Retrieval Augmented Generation (RAG) workflows, targeting developers and researchers who find existing libraries like Langchain or LlamaIndex insufficient for complex, high-accuracy RAG implementations. It offers enhanced features such as OCR document loading, integrated reranking, and support for multiple retrievers, aiming to simplify the creation of sophisticated RAG systems.

How It Works

RAGchain separates retrieval from content storage, using a "Linker" to connect multiple retrievers and databases. This architecture facilitates the use of diverse retrieval strategies (e.g., BM25, vector DB, hybrid) and rerankers (e.g., UPR, TART, MonoT5) to improve accuracy. It also incorporates OCR loaders (Nougat, Deepdoctection) for better document ingestion and provides pre-made RAG pipelines for rapid deployment of complex workflows.

Quick Start & Requirements

  • Install via pip: pip install RAGchain
  • For development, clone the repository and run python3 setup.py develop.
  • Additional development requirements can be installed with pip install dev_requirements.txt.
  • Supports various LLM models, vector databases, and includes integrations for web search (Google, Bing).
  • Links: Docs, API Spec, QuickStart

Highlighted Details

  • Advanced RAG features: Time-Aware RAG, Importance-Aware RAG.
  • Multiple retrieval options: BM25, Vector DB, Hybrid (rrf, cc).
  • Integrated rerankers: UPR, TART, BM25, MonoT5.
  • OCR Loaders: Nougat, Deepdoctection.
  • Supports numerous datasets for evaluation, including MS-MARCO, Natural QA, and TriviaQA.
  • Includes easy benchmarking modules for workflow evaluation.

Maintenance & Community

The project is an early version and welcomes contributions via issues and pull requests. Further community engagement details are not specified in the README.

Licensing & Compatibility

Licensed under the Apache 2.0 License. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is explicitly stated to be in an early version and may be unstable. Specific limitations or unsupported features are not detailed beyond this general caveat.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
1 more.

AutoRAG by Marker-Inc-Korea

0.3%
4k
RAG AutoML tool for optimizing RAG pipelines
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.