complex-RAG-guide by FareedKhan-dev

Building advanced RAG systems with LLM agents

Created 6 months ago

393 stars

Top 73.2% on SourcePulse

Project Summary

A comprehensive guide to building a production-ready Retrieval Augmented Generation (RAG) system, this repository targets developers and researchers seeking to implement complex, real-world RAG pipelines. It offers a step-by-step walkthrough using LangChain and LangGraph, demonstrating advanced techniques to enhance accuracy, reduce hallucinations, and improve response quality for challenging use cases.

How It Works

The system orchestrates a sophisticated RAG pipeline involving data preprocessing (chunking, cleaning, logical splitting), multi-source retrieval (book chunks, chapter summaries, quotes), query rewriting, context filtering, and LLM-driven planning and execution. It employs Chain-of-Thought (CoT) reasoning, anonymization/de-anonymization for unbiased planning, and a task handler to select appropriate sub-graphs for retrieval or answering, ensuring robust handling of complex queries and grounding responses in provided context.

Quick Start & Requirements

Primary Install/Run: Requires Python environment setup with libraries like LangChain, LangGraph, PyPDF2, and re. API keys for LLM providers (OpenAI, Together AI, Groq) are necessary.
Prerequisites: Python 3.x, API keys for LLM services. The project uses the Harry Potter books as a dataset, requiring download.
Setup: Involves setting environment variables for API keys and potentially installing Python packages.
Links: Mentions following the author on Medium.

Highlighted Details

End-to-end implementation of a complex RAG pipeline.
Modular design using LangGraph with sub-graphs for specific functions (retrieval, distillation, hallucination reduction).
Advanced LLM-driven features: planning, re-planning, task execution, query rewriting, and context filtering.
Multi-source retrieval strategy combining traditional chunks, chapter summaries, and specific quotes.
Robust evaluation framework using RAGAS with metrics like faithfulness, answer relevancy, and context recall.

Maintenance & Community

The project acknowledges foundational work by nirDiamant and encourages following the author, Fareed Khan, on Medium. No explicit community channels (e.g., Discord, Slack) or detailed contributor information are provided in the README snippet.

Licensing & Compatibility

The licensing information is not specified in the provided README content. Compatibility for commercial use or closed-source linking would require clarification on the project's license.

Limitations & Caveats

The project relies heavily on multiple LLM providers, necessitating API keys and potentially incurring costs. The complexity of the pipeline, while powerful, may present a steep learning curve for users unfamiliar with LangChain, LangGraph, and advanced RAG concepts. The effectiveness is also dependent on the quality and availability of the underlying LLM models used.

complex-RAG-guide by FareedKhan-dev

Explore Similar Projects

awesome-rag by coree

Awesome-RAG by lucifertrj

awesome-rag by awesome-rag

RAG-Book by Nipi64310

synthesizer by SciPhi-AI

Advanced_RAG by NisaarAgharia

agentset by agentset-ai

all-rag-techniques by FareedKhan-dev

rag-cookbooks by athina-ai

rag-zero-to-hero-guide by KalyanKS-NLP

second-brain-ai-assistant-course by decodingai-magazine

AutoRAG by Marker-Inc-Korea