ragbuilder by KruxAI

RAG toolkit for production RAG pipeline automation

Created 1 year ago

1,527 stars

Top 26.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Cybersecurity Lead at Google DeepMind

Project Summary

This toolkit automates the creation of production-ready Retrieval-Augmented Generation (RAG) systems by hyperparameter tuning and leveraging pre-defined templates. It targets developers and researchers needing to quickly establish high-performing RAG pipelines for their data, offering significant time savings and improved accuracy over manual configuration.

How It Works

RagBuilder employs Bayesian optimization to systematically tune RAG parameters like chunking strategies, chunk sizes, retriever types, and rerankers. It evaluates configurations against a provided or synthetically generated test dataset to identify optimal settings. The toolkit also provides access to state-of-the-art RAG components and templates, allowing users to integrate advanced techniques like graph retrieval or semantic chunking.

Quick Start & Requirements

Install via pip: pip install ragbuilder
Requires Python 3.x.
API keys for LLM and embedding providers (e.g., OpenAI, Azure OpenAI, Cohere) are necessary and can be configured via environment variables or directly in code.
See Quick Start and Configuration Guide.

Highlighted Details

Supports hyperparameter tuning for data ingestion, retrieval, and generation modules.
Offers a variety of pre-defined document loaders, chunking strategies, retrievers (including graph-based), and rerankers.
Enables deployment as an API service and persistence of optimized RAG pipelines.
Includes options for custom evaluation metrics and synthetic test data generation.

Maintenance & Community

Actively maintained with contributions welcomed.
Usage analytics are collected anonymously by default; opt-out available via .env.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Graph-based retrieval requires a Neo4j instance and associated credentials.
Performance and optimization success are dependent on the quality and representativeness of the provided test data.

Health Check

Last Commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days