llm-applications by ray-project

RAG application guide for production

Created 2 years ago

1,845 stars

Top 23.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Elvis Saravia

Founder of DAIR.AI

Philipp Moritz

Cofounder of Anyscale

Project Summary

This repository provides a comprehensive guide and practical implementation for building production-ready Retrieval Augmented Generation (RAG) based LLM applications. It targets engineers and researchers looking to develop, scale, and optimize RAG systems, offering a structured approach to productionizing LLM workflows.

How It Works

The project guides users through developing RAG applications from scratch, focusing on scaling key components like data loading, chunking, embedding, indexing, and serving. It emphasizes evaluating different configurations for performance and quality, implementing hybrid routing for OSS and closed LLMs, and deploying scalable, highly available applications. The approach leverages Ray for distributed computing to handle the scaling requirements of these components.

Quick Start & Requirements

Install: pip install --user -r requirements.txt
Prerequisites: OpenAI and Anyscale Endpoints API keys, Python 3.9 (implied by default_cluster_env_2.6.2_py39), PostgreSQL database.
Compute: Recommended GPU access (e.g., Anyscale g3.8xlarge instance).
Data: Example data available at /efs/shared_storage/goku/docs.ray.io/en/master/ on Anyscale staging.
Links: Blog Post, Notebook, Anyscale Endpoints

Highlighted Details

Develop RAG applications from scratch.
Scale major components (load, chunk, embed, index, serve).
Evaluate configurations for performance and quality.
Implement LLM hybrid routing.
Serve applications scalably and with high availability.

Maintenance & Community

This project is part of the Ray ecosystem, developed by Anyscale. Further information on Ray and Anyscale can be found on their respective documentation sites and through Anyscale's community channels.

Licensing & Compatibility

The repository is licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

The guide assumes familiarity with LLM concepts and distributed computing frameworks like Ray. While local execution is possible, a GPU-accelerated environment is highly recommended for practical performance and scaling demonstrations.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days