pdftochat by Nutlope

Next.js app for PDF Q&A using RAG

Created 2 years ago

1,324 stars

Top 30.1% on SourcePulse

1 Expert Loves This Project

philschmid

DevRel at Google DeepMind

Project Summary

This project provides a web application for interacting with PDF documents using AI, specifically targeting users who need to quickly extract information or engage in Q&A over their documents. It leverages a Retrieval-Augmented Generation (RAG) pipeline powered by Together AI for language models and embeddings, with MongoDB Atlas serving as the vector database.

How It Works

The application utilizes a RAG architecture. PDFs are processed, chunked, and converted into embeddings using the M2 Bert 80M model via Together AI. These embeddings are stored in a MongoDB Atlas vector database. When a user queries the PDF, their query is also embedded, and similar document chunks are retrieved from the database. These chunks are then passed to the Mixtral LLM (also via Together AI) as context to generate an answer. LangChain.js orchestrates the RAG pipeline.

Quick Start & Requirements

Install/Run: Deploy to Vercel or other hosts. Requires setting up several external services.
Prerequisites: Together AI account, MongoDB Atlas cluster with a specific vector search index (768 dimensions, Euclidean similarity), Bytescale account, Clerk account. Environment variables must be configured via an .env file.
Setup: Requires configuring multiple external services and environment variables.
Links: .example.env for environment variables.

Highlighted Details

Powered by Together AI for Mixtral LLM and M2 Bert 80M embeddings.
Uses MongoDB Atlas as a vector database with a specific vector search index configuration.
Built with Next.js App Router, Clerk for authentication, and Tailwind CSS.
Bytescale is used for PDF storage.

Maintenance & Community

Sponsored by Together AI, Bytescale, Pinecone, and Clerk.
Future tasks include adding PDF deletion, exploring different embedding models, prompt engineering, API route protection, and benchmarking.
Contributions are welcome.

Licensing & Compatibility

The README does not explicitly state a license. However, the use of services like Together AI, MongoDB Atlas, Bytescale, and Clerk implies potential costs and terms of service associated with those platforms. Compatibility for commercial use would depend on the licenses of these underlying services and any explicit license for this repository.

Limitations & Caveats

The setup process is complex, requiring integration with multiple third-party services and specific database configurations.
The project is described as a "template," suggesting it may require significant customization for production use.
PDFs larger than 10MB may require compression.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

6 stars in the last 30 days

Explore Similar Projects

Starred by

Andre Zayarni

Andre Zayarni(Cofounder of Qdrant).

talk2arxiv by evanhu1

RAG system for ArXiv paper PDFs

Created 2 years ago

Updated 2 years ago

Starred by

Deshraj Yadav

Deshraj Yadav(Cofounder of Mem0) and

Taranjeet Singh

Taranjeet Singh(Cofounder of Mem0).

embedchainjs by mem0ai

JavaScript framework for LLM-powered bots over any dataset

Created 2 years ago

Updated 2 years ago

local-LLM-with-RAG by amscotti

Local LLM inference with RAG for document Q&A

Created 2 years ago

Updated 1 week ago

IncarnaMind by junruxiong

Tool for LLM-powered document interaction

Created 2 years ago

Updated 11 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera) and

Philipp Schmid

Philipp Schmid(DevRel at Google DeepMind).

llm-search by snexus

Advanced RAG system for local document interaction

Created 2 years ago

Updated 5 months ago

Starred by

Alex Cheema

Alex Cheema(Cofounder of EXO Labs),

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI), and

2 more.

Dot by alexpinel

Local RAG app for document interaction using local LLMs

Created 1 year ago

Updated 1 year ago

ChatPDF by shibing624

RAG for local LLM, enables chat with PDF/docs

Created 2 years ago

Updated 9 months ago

langchain-supabase-website-chatbot by mayooear

Chatbot for website Q&A using LLMs and vector DB

Created 2 years ago

Updated 2 years ago

rag_api by danny-avila

Async API for ID-based RAG using Langchain

Created 1 year ago

Updated 1 week ago

Starred by

Eric Ciarla

Eric Ciarla(Cofounder of Firecrawl),

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind), and

3 more.

fully-local-pdf-chatbot by jacoblee93

Local PDF chatbot for document interaction

Created 2 years ago

Updated 1 month ago

Starred by

Didier Lopes

Didier Lopes(Founder of OpenBB),

Han Wang

Han Wang(Cofounder of Mintlify), and

3 more.

LlamaIndexTS by run-llama

Data framework for LLM apps, server-side focused

Created 2 years ago

Updated 4 days ago

Starred by

Dharmesh Shah

Dharmesh Shah(Cofounder of HubSpot),

Taranjeet Singh

Taranjeet Singh(Cofounder of Mem0), and

6 more.

ai-pdf-chatbot-langchain by mayooear

AI chatbot agent for PDF document Q&A using LangChain & LangGraph

Created 2 years ago

Updated 10 months ago

Feedback? Help us improve.