llm-applications  by ray-project

RAG application guide for production

created 1 year ago
1,813 stars

Top 24.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive guide and practical implementation for building production-ready Retrieval Augmented Generation (RAG) based LLM applications. It targets engineers and researchers looking to develop, scale, and optimize RAG systems, offering a structured approach to productionizing LLM workflows.

How It Works

The project guides users through developing RAG applications from scratch, focusing on scaling key components like data loading, chunking, embedding, indexing, and serving. It emphasizes evaluating different configurations for performance and quality, implementing hybrid routing for OSS and closed LLMs, and deploying scalable, highly available applications. The approach leverages Ray for distributed computing to handle the scaling requirements of these components.

Quick Start & Requirements

  • Install: pip install --user -r requirements.txt
  • Prerequisites: OpenAI and Anyscale Endpoints API keys, Python 3.9 (implied by default_cluster_env_2.6.2_py39), PostgreSQL database.
  • Compute: Recommended GPU access (e.g., Anyscale g3.8xlarge instance).
  • Data: Example data available at /efs/shared_storage/goku/docs.ray.io/en/master/ on Anyscale staging.
  • Links: Blog Post, Notebook, Anyscale Endpoints

Highlighted Details

  • Develop RAG applications from scratch.
  • Scale major components (load, chunk, embed, index, serve).
  • Evaluate configurations for performance and quality.
  • Implement LLM hybrid routing.
  • Serve applications scalably and with high availability.

Maintenance & Community

This project is part of the Ray ecosystem, developed by Anyscale. Further information on Ray and Anyscale can be found on their respective documentation sites and through Anyscale's community channels.

Licensing & Compatibility

The repository is licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

The guide assumes familiarity with LLM concepts and distributed computing frameworks like Ray. While local execution is possible, a GPU-accelerated environment is highly recommended for practical performance and scaling demonstrations.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Philipp Schmid Philipp Schmid(DevRel at Google DeepMind), and
2 more.

LightLLM by ModelTC

0.7%
3k
Python framework for LLM inference and serving
created 2 years ago
updated 15 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tobi Lutke Tobi Lutke(Cofounder of Shopify), and
27 more.

vllm by vllm-project

1.0%
54k
LLM serving engine for high-throughput, memory-efficient inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.