documentation-helper  by emarco177

RAG-powered documentation assistant

Created 2 years ago
260 stars

Top 97.6% on SourcePulse

GitHubView on GitHub
Project Summary

A RAG-based documentation assistant built with LangChain, Pinecone, and Tavily, this project offers an intelligent web application for querying LangChain documentation. It targets developers and researchers seeking accurate, context-aware answers with source citations, leveraging advanced web crawling and conversational memory.

How It Works

The project employs a Retrieval-Augmented Generation (RAG) pipeline. It begins with Tavily for real-time web crawling and content extraction, followed by intelligent chunking and preprocessing of documentation. Pinecone is used for embedding and indexing, enabling fast similarity search. LangChain orchestrates the retrieval of context-aware documents based on user queries, incorporating a conversational memory system for coreference resolution. Finally, OpenAI GPT generates accurate, contextual answers with source citations, presented through a Streamlit interface.

Quick Start & Requirements

  • Prerequisites: Python 3.8+, OpenAI API key, Pinecone API key, Tavily API key.
  • Installation: Clone the repository, set environment variables (PINECONE_API_KEY, OPENAI_API_KEY, TAVILY_API_KEY) in a .env file, and run pipenv install.
  • Ingestion: Execute python ingestion.py to crawl and index documentation.
  • Run: Launch the application with streamlit run main.py and access it via http://localhost:8501.
  • Docs: Includes Jupyter notebooks for Tavily API tutorials.

Highlighted Details

  • Comprehensive RAG pipeline integrating web crawling (Tavily), vector storage (Pinecone), and conversational AI (LangChain, OpenAI GPT).
  • Features real-time processing, intelligent retrieval, and a user-friendly Streamlit interface with source citations.
  • Provides tutorial notebooks for Tavily API and crawling capabilities.
  • Designed as a learning tool for LangChain, vector search, and RAG architecture.

Maintenance & Community

Contributions are welcome via pull requests; major changes should be discussed in an issue first. The project links to the author's portfolio, LinkedIn, and Twitter for connection. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

Licensed under the MIT License. No explicit restrictions for commercial use or closed-source linking are mentioned.

Limitations & Caveats

The project requires API keys for Pinecone, OpenAI, and Tavily to function. The README does not detail specific limitations, unsupported platforms, or known bugs.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.