LangchainDocuments  by nicknochnack

LangChain pipeline for document Q&A

created 2 years ago
250 stars

Top 100.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project demonstrates how to integrate custom documents into a Langchain pipeline using ChromaDB for efficient retrieval, enabling a GPT-powered application that can answer questions based on provided PDF content without requiring LLM fine-tuning. It is targeted at developers and researchers looking to build RAG (Retrieval-Augmented Generation) systems.

How It Works

The core approach utilizes Langchain's document loading and vectorization capabilities with ChromaDB as the vector store. PDFs are loaded, split into chunks, embedded using an LLM, and stored in ChromaDB. When a user queries the system, the query is embedded, used to retrieve relevant document chunks from ChromaDB, and then passed to an LLM along with the original query to generate an informed response. This method avoids the need for costly LLM fine-tuning by providing context at inference time.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Run the application: streamlit run app.py
  • Prerequisites: Python environment, OpenAI API key (added to app.py).

Highlighted Details

  • Leverages ChromaDB for efficient vector storage and retrieval.
  • Enables RAG without LLM fine-tuning.
  • Demonstrates Langchain VectorStore Agents.

Maintenance & Community

  • Author: Nick Renotte.
  • Community links are not provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatible with commercial use and closed-source applications.

Limitations & Caveats

The project is described as version 1.?, suggesting it may be in an early or evolving stage. Specific limitations regarding document types, chunking strategies, or performance under heavy load are not detailed.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.