llm-applications  by ray-project

RAG application guide for production

Created 2 years ago
1,831 stars

Top 23.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive guide and practical implementation for building production-ready Retrieval Augmented Generation (RAG) based LLM applications. It targets engineers and researchers looking to develop, scale, and optimize RAG systems, offering a structured approach to productionizing LLM workflows.

How It Works

The project guides users through developing RAG applications from scratch, focusing on scaling key components like data loading, chunking, embedding, indexing, and serving. It emphasizes evaluating different configurations for performance and quality, implementing hybrid routing for OSS and closed LLMs, and deploying scalable, highly available applications. The approach leverages Ray for distributed computing to handle the scaling requirements of these components.

Quick Start & Requirements

  • Install: pip install --user -r requirements.txt
  • Prerequisites: OpenAI and Anyscale Endpoints API keys, Python 3.9 (implied by default_cluster_env_2.6.2_py39), PostgreSQL database.
  • Compute: Recommended GPU access (e.g., Anyscale g3.8xlarge instance).
  • Data: Example data available at /efs/shared_storage/goku/docs.ray.io/en/master/ on Anyscale staging.
  • Links: Blog Post, Notebook, Anyscale Endpoints

Highlighted Details

  • Develop RAG applications from scratch.
  • Scale major components (load, chunk, embed, index, serve).
  • Evaluate configurations for performance and quality.
  • Implement LLM hybrid routing.
  • Serve applications scalably and with high availability.

Maintenance & Community

This project is part of the Ray ecosystem, developed by Anyscale. Further information on Ray and Anyscale can be found on their respective documentation sites and through Anyscale's community channels.

Licensing & Compatibility

The repository is licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

The guide assumes familiarity with LLM concepts and distributed computing frameworks like Ray. While local execution is possible, a GPU-accelerated environment is highly recommended for practical performance and scaling demonstrations.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.