RAG application guide for production
Top 24.4% on sourcepulse
This repository provides a comprehensive guide and practical implementation for building production-ready Retrieval Augmented Generation (RAG) based LLM applications. It targets engineers and researchers looking to develop, scale, and optimize RAG systems, offering a structured approach to productionizing LLM workflows.
How It Works
The project guides users through developing RAG applications from scratch, focusing on scaling key components like data loading, chunking, embedding, indexing, and serving. It emphasizes evaluating different configurations for performance and quality, implementing hybrid routing for OSS and closed LLMs, and deploying scalable, highly available applications. The approach leverages Ray for distributed computing to handle the scaling requirements of these components.
Quick Start & Requirements
pip install --user -r requirements.txt
default_cluster_env_2.6.2_py39
), PostgreSQL database.g3.8xlarge
instance)./efs/shared_storage/goku/docs.ray.io/en/master/
on Anyscale staging.Highlighted Details
Maintenance & Community
This project is part of the Ray ecosystem, developed by Anyscale. Further information on Ray and Anyscale can be found on their respective documentation sites and through Anyscale's community channels.
Licensing & Compatibility
The repository is licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial and closed-source applications.
Limitations & Caveats
The guide assumes familiarity with LLM concepts and distributed computing frameworks like Ray. While local execution is possible, a GPU-accelerated environment is highly recommended for practical performance and scaling demonstrations.
1 year ago
Inactive