semantic-search-nextjs-pinecone-langchain-chatgpt by dabit3

Full-stack starter for semantic search over documents

Created 2 years ago

763 stars

Top 45.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

This project provides a full-stack starter kit for building semantic search applications. It targets developers looking to integrate Next.js, LangchainJS, Pinecone, and OpenAI's GPT models to create conversational AI experiences powered by custom data. The primary benefit is a functional, albeit basic, template to accelerate development in this rapidly evolving space.

How It Works

The application embeds text files into vector representations using LangchainJS. These vectors are then stored and indexed in Pinecone, a vector database optimized for similarity search. A Next.js frontend allows users to query this data semantically, with GPT3 providing natural language understanding and response generation. This approach leverages specialized tools for each part of the pipeline—embedding, indexing, and querying—to deliver a robust semantic search capability.

Quick Start & Requirements

Install dependencies: npm install or yarn install
Run the app: npm run dev
Prerequisites: OpenAI API key, Pinecone API key.
Data: Place custom text or markdown files in the /documents folder.
Setup time: Index initialization can take 2-4 minutes.
Documentation: Node.js tutorial (project is a Next.js port).

Highlighted Details

Integrates Next.js, LangchainJS, Pinecone, and GPT3.
Enables semantic search over custom text data.
Uses Pinecone for efficient vector storage and retrieval.
Provides a conversational interface powered by GPT3.

Maintenance & Community

The project is a personal starter kit by the author, David Dabit. Further community engagement or maintenance status is not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as a "basic starter project" and may require significant modification for production use. The index initialization process includes a setTimeout that might fail if index creation exceeds 3 minutes, requiring manual monitoring and re-runs. The default data is specific to the Lens protocol, necessitating replacement for other use cases.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days