documentation-helper by emarco177

RAG-powered documentation assistant

Created 3 years ago

324 stars

Top 83.6% on SourcePulse

Project Summary

A RAG-based documentation assistant built with LangChain, Pinecone, and Tavily, this project offers an intelligent web application for querying LangChain documentation. It targets developers and researchers seeking accurate, context-aware answers with source citations, leveraging advanced web crawling and conversational memory.

How It Works

The project employs a Retrieval-Augmented Generation (RAG) pipeline. It begins with Tavily for real-time web crawling and content extraction, followed by intelligent chunking and preprocessing of documentation. Pinecone is used for embedding and indexing, enabling fast similarity search. LangChain orchestrates the retrieval of context-aware documents based on user queries, incorporating a conversational memory system for coreference resolution. Finally, OpenAI GPT generates accurate, contextual answers with source citations, presented through a Streamlit interface.

Quick Start & Requirements

Prerequisites: Python 3.8+, OpenAI API key, Pinecone API key, Tavily API key.
Installation: Clone the repository, set environment variables (PINECONE_API_KEY, OPENAI_API_KEY, TAVILY_API_KEY) in a .env file, and run pipenv install.
Ingestion: Execute python ingestion.py to crawl and index documentation.
Run: Launch the application with streamlit run main.py and access it via http://localhost:8501.
Docs: Includes Jupyter notebooks for Tavily API tutorials.

Highlighted Details

Comprehensive RAG pipeline integrating web crawling (Tavily), vector storage (Pinecone), and conversational AI (LangChain, OpenAI GPT).
Features real-time processing, intelligent retrieval, and a user-friendly Streamlit interface with source citations.
Provides tutorial notebooks for Tavily API and crawling capabilities.
Designed as a learning tool for LangChain, vector search, and RAG architecture.

Maintenance & Community

Contributions are welcome via pull requests; major changes should be discussed in an issue first. The project links to the author's portfolio, LinkedIn, and Twitter for connection. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

Licensed under the MIT License. No explicit restrictions for commercial use or closed-source linking are mentioned.

Limitations & Caveats

The project requires API keys for Pinecone, OpenAI, and Tavily to function. The README does not detail specific limitations, unsupported platforms, or known bugs.

documentation-helper by emarco177

Explore Similar Projects

argus by quarqlabs

conversational-agent-langchain by mfmezger

Argus by DevYangJC

Chat_with_Datawhale_langchain by logan-zou

weblangchain by langchain-ai

chat-with-websites by alejandro-ao

OpenAI-DotNet by RageAgainstThePixel

Chinese-LangChain by yanqiangmiffy

openrag by langflow-ai

agentic-rag-for-dummies by GiovanniPasq

gemini-fullstack-langgraph-quickstart by google-gemini

ai-pdf-chatbot-langchain by mayooear