RAG-Book by Nipi64310

RAG system evolution and implementation

Created 1 year ago

268 stars

Top 95.9% on SourcePulse

Project Summary

This repository serves as a code and resource compilation for the book "Practical RAG for Large Models." It targets developers and researchers interested in advancing Retrieval-Augmented Generation (RAG) systems beyond basic implementations, offering practical techniques and code examples for building sophisticated RAG pipelines.

How It Works

The project categorizes RAG advancements into three stages: Basic, Advanced, and Super. Basic RAG involves simple pipelines with fixed-length chunking and retrieval. Advanced RAG focuses on optimizing retrieval and generation through techniques like query enhancement (rewriting, step-back, sub-questions), chunk enhancement (compression, guessing questions), retrieval enhancement (mixed search, metadata filtering), and generation enhancement (traceability). Super RAG explores Agentic RAG, multimodal RAG, structured RAG, and GraphRAG, integrating LLMs as decision-making engines and leveraging graph structures for richer context.

Quick Start & Requirements

Installation: Primarily involves cloning the repository and setting up Python environments. Specific commands for running individual components (e.g., training embedding models, finetuning LLMs) are detailed within respective directories.
Prerequisites: Python 3.x, and potentially specific libraries like Langchain. GPU and CUDA are likely required for model training and finetuning. Some sections may require API keys for closed-source models.
Resources: Training embedding models and finetuning LLMs can be resource-intensive, requiring significant GPU memory and time.
Links: The repository structure implies code examples for various RAG techniques, with specific chapters dedicated to Langchain demos and data preparation.

Highlighted Details

Explores advanced RAG techniques including query and chunk enhancement, retrieval strategies (mixed search, metadata filtering), and generation improvements (traceability).
Covers Agentic RAG, where LLMs act as engines to plan and execute tasks using retrieval as tools.
Includes sections on GraphRAG, modifying existing implementations for Chinese prompts and private models, and multimodal RAG.
Details different model training approaches: Independent Training, Sequential Training (LLM First/Retriever First), and Joint Training for RAG systems.

Maintenance & Community

The project is associated with the book "大模型RAG实战" (Practical RAG for Large Models), suggesting a structured development and potential for community engagement around the book's content. Specific community links (Discord, Slack) are not explicitly mentioned in the provided README excerpt.

Licensing & Compatibility

The licensing information is not detailed in the provided README excerpt. Compatibility for commercial use would depend on the specific licenses of the underlying models and libraries used.

Limitations & Caveats

Some sections are marked as "待更新~" (to be updated), indicating that certain advanced RAG techniques (e.g., retrieval enhancement, generation enhancement) are still under development or documentation. The project's focus on specific Chinese models might limit direct applicability for users relying solely on English-centric ecosystems.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

5 stars in the last 30 days

Explore Similar Projects

awesome-rag by coree

Curated list of resources for retrieval-augmented generation (RAG) in LLMs

Created 1 year ago

Updated 1 month ago

Awesome-RAG by lucifertrj

Awesome RAG examples and tutorials

Created 2 years ago

Updated 2 months ago

awesome-rag by awesome-rag

Curated list of RAG papers and systems

Created 1 year ago

Updated 5 months ago

Starred by

Vasek Mlejnsky

Vasek Mlejnsky(Cofounder of E2B),

Shyamal Anadkat

Shyamal Anadkat(Research Scientist at OpenAI), and

6 more.

synthesizer by SciPhi-AI

LLM framework for RAG and data creation

Created 2 years ago

Updated 2 years ago

Awesome-RAG by Danielskry

Awesome list of RAG resources

Created 1 year ago

Updated 2 months ago

Advanced_RAG by NisaarAgharia

Python notebooks for advanced RAG techniques

Created 1 year ago

Updated 1 year ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic) and

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

RAG-Survey by hymie122

RAG paper collection for AI-Generated Content

Created 1 year ago

Updated 1 year ago

FlashRAG by RUC-NLPIR

Python toolkit for efficient RAG research

Created 1 year ago

Updated 1 month ago

all-rag-techniques by FareedKhan-dev

Jupyter notebooks for RAG technique implementations

Created 10 months ago

Updated 6 months ago

Starred by

Harrison Chase

Harrison Chase(Founder of LangChain),

Meng Zhang

Meng Zhang(Cofounder of TabbyML), and

2 more.

rag-cookbooks by athina-ai

RAG cookbooks for advanced techniques

Created 1 year ago

Updated 10 months ago

rag-zero-to-hero-guide by KalyanKS-NLP

RAG learning guide, from basics to advanced

Created 10 months ago

Updated 9 months ago

all-in-rag by datawhalechina

RAG development guide

Created 7 months ago

Updated 2 days ago

Feedback? Help us improve.