bRAG-langchain  by bragai

RAG exploration notebooks

created 8 months ago
2,957 stars

Top 16.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive, hands-on guide to building Retrieval-Augmented Generation (RAG) applications using LangChain. It targets developers and researchers looking to implement advanced RAG techniques, from basic setups to complex multi-querying, routing, and re-ranking strategies, offering practical notebook-based tutorials.

How It Works

The project guides users through building RAG pipelines by leveraging LangChain's modular components. It demonstrates setting up data loaders, embedding generation (including OpenAI and ColBERT), vector stores (ChromaDB, Pinecone), and implementing various retrieval strategies like multi-querying, semantic routing, and RAG-Fusion with Reciprocal Rank Fusion (RRF). The approach emphasizes practical implementation through sequential notebooks, covering advanced indexing and re-ranking for improved relevance and scalability.

Quick Start & Requirements

  • Install: Clone the repo, create a Python 3.11.11 virtual environment, and run pip install -r requirements.txt.
  • Prerequisites: Python 3.11.11, OpenAI API key, LangSmith API key, Pinecone API key, Cohere API key (optional).
  • Setup: Requires setting API keys in a .env file. Notebooks should be run sequentially.
  • Docs: Notebooks

Highlighted Details

  • Detailed notebooks cover RAG from basic setup to advanced techniques like multi-querying, semantic routing, and query structuring.
  • Explores advanced indexing and retrieval methods including RAPTOR and ColBERT for token-level vector indexing.
  • Demonstrates re-ranking strategies like Reciprocal Rank Fusion (RRF) and Cohere's re-ranking model.
  • Includes examples for integrating metadata filters and structured search prompting.

Maintenance & Community

  • Contact: Taha Ababou (taha@bragai.dev) for questions or collaboration.
  • Inspiration: Lance Martin's LangChain Tutorial.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Requires specific Python version (3.11.11) and API keys for core functionality.

Limitations & Caveats

The project relies heavily on external API keys (OpenAI, Pinecone, Cohere), which incur costs. The license is not specified, which may impact commercial use. The README mentions "bRAGAI is coming soon," suggesting potential future productization or shifts in focus.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
140 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.1%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 1 day ago
Feedback? Help us improve.