ArXivChatGuru by redis-developer

RAG app for interacting with ArXiv research papers

Created 2 years ago

553 stars

Top 57.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Sam Partee

Cofounder of Arcade

Project Summary

ArXivChatGuru enables users to interact with research papers from ArXiv by leveraging LangChain and Redis. This tool is designed for researchers and developers interested in understanding Retrieval Augmented Generation (RAG) systems, offering a practical demonstration of vector databases and semantic caching in a scientific context.

How It Works

The application retrieves papers from ArXiv based on a user-provided topic. These papers are then segmented into smaller chunks, and embeddings are generated for each chunk. Redis serves as a vector database, storing these embeddings for efficient similarity search. When a user asks a question, the system retrieves the most relevant document chunks from Redis and uses them with an OpenAI model to generate an answer, showcasing the RAG process.

Quick Start & Requirements

Install: Clone the repository, create and populate .env with OPENAI_API_KEY, then run poetry install --no-root followed by poetry run streamlit run app.py. Alternatively, use docker compose up after setting up .env.
Prerequisites: OpenAI API Key, Python 3.x, Poetry.
Links: GitHub Repo

Highlighted Details

Demonstrates LangChain's ArXiv Loader for direct paper retrieval.
Utilizes Redis as both a vector database and a semantic cache for RAG.
Provides learning opportunities on context window size, vector distance, and document retrieval impact on RAG performance.
Built with Streamlit for an interactive user interface.

Maintenance & Community

This project is from redis-developer, indicating potential backing or focus from the Redis community. Contributions and feedback are welcomed.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is explicitly stated as a learning tool, not a production-ready application, and is not designed for scalability. Chunking is described as "arbitrary."

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days