semantic-search-nextjs-pinecone-langchain-chatgpt  by dabit3

Full-stack starter for semantic search over documents

created 2 years ago
756 stars

Top 46.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a full-stack starter kit for building semantic search applications. It targets developers looking to integrate Next.js, LangchainJS, Pinecone, and OpenAI's GPT models to create conversational AI experiences powered by custom data. The primary benefit is a functional, albeit basic, template to accelerate development in this rapidly evolving space.

How It Works

The application embeds text files into vector representations using LangchainJS. These vectors are then stored and indexed in Pinecone, a vector database optimized for similarity search. A Next.js frontend allows users to query this data semantically, with GPT3 providing natural language understanding and response generation. This approach leverages specialized tools for each part of the pipeline—embedding, indexing, and querying—to deliver a robust semantic search capability.

Quick Start & Requirements

  • Install dependencies: npm install or yarn install
  • Run the app: npm run dev
  • Prerequisites: OpenAI API key, Pinecone API key.
  • Data: Place custom text or markdown files in the /documents folder.
  • Setup time: Index initialization can take 2-4 minutes.
  • Documentation: Node.js tutorial (project is a Next.js port).

Highlighted Details

  • Integrates Next.js, LangchainJS, Pinecone, and GPT3.
  • Enables semantic search over custom text data.
  • Uses Pinecone for efficient vector storage and retrieval.
  • Provides a conversational interface powered by GPT3.

Maintenance & Community

The project is a personal starter kit by the author, David Dabit. Further community engagement or maintenance status is not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as a "basic starter project" and may require significant modification for production use. The index initialization process includes a setTimeout that might fail if index creation exceeds 3 minutes, requiring manual monitoring and re-runs. The default data is specific to the Lens protocol, necessitating replacement for other use cases.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 18 hours ago
Feedback? Help us improve.