ragbuilder  by KruxAI

RAG toolkit for production RAG pipeline automation

created 1 year ago
1,451 stars

Top 28.8% on sourcepulse

GitHubView on GitHub
Project Summary

This toolkit automates the creation of production-ready Retrieval-Augmented Generation (RAG) systems by hyperparameter tuning and leveraging pre-defined templates. It targets developers and researchers needing to quickly establish high-performing RAG pipelines for their data, offering significant time savings and improved accuracy over manual configuration.

How It Works

RagBuilder employs Bayesian optimization to systematically tune RAG parameters like chunking strategies, chunk sizes, retriever types, and rerankers. It evaluates configurations against a provided or synthetically generated test dataset to identify optimal settings. The toolkit also provides access to state-of-the-art RAG components and templates, allowing users to integrate advanced techniques like graph retrieval or semantic chunking.

Quick Start & Requirements

  • Install via pip: pip install ragbuilder
  • Requires Python 3.x.
  • API keys for LLM and embedding providers (e.g., OpenAI, Azure OpenAI, Cohere) are necessary and can be configured via environment variables or directly in code.
  • See Quick Start and Configuration Guide.

Highlighted Details

  • Supports hyperparameter tuning for data ingestion, retrieval, and generation modules.
  • Offers a variety of pre-defined document loaders, chunking strategies, retrievers (including graph-based), and rerankers.
  • Enables deployment as an API service and persistence of optimized RAG pipelines.
  • Includes options for custom evaluation metrics and synthetic test data generation.

Maintenance & Community

  • Actively maintained with contributions welcomed.
  • Usage analytics are collected anonymously by default; opt-out available via .env.

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

  • Graph-based retrieval requires a Neo4j instance and associated credentials.
  • Performance and optimization success are dependent on the quality and representativeness of the provided test data.
Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
47 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.