semantic-search-nextjs-pinecone-langchain-chatgpt  by dabit3

Full-stack starter for semantic search over documents

Created 2 years ago
760 stars

Top 45.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a full-stack starter kit for building semantic search applications. It targets developers looking to integrate Next.js, LangchainJS, Pinecone, and OpenAI's GPT models to create conversational AI experiences powered by custom data. The primary benefit is a functional, albeit basic, template to accelerate development in this rapidly evolving space.

How It Works

The application embeds text files into vector representations using LangchainJS. These vectors are then stored and indexed in Pinecone, a vector database optimized for similarity search. A Next.js frontend allows users to query this data semantically, with GPT3 providing natural language understanding and response generation. This approach leverages specialized tools for each part of the pipeline—embedding, indexing, and querying—to deliver a robust semantic search capability.

Quick Start & Requirements

  • Install dependencies: npm install or yarn install
  • Run the app: npm run dev
  • Prerequisites: OpenAI API key, Pinecone API key.
  • Data: Place custom text or markdown files in the /documents folder.
  • Setup time: Index initialization can take 2-4 minutes.
  • Documentation: Node.js tutorial (project is a Next.js port).

Highlighted Details

  • Integrates Next.js, LangchainJS, Pinecone, and GPT3.
  • Enables semantic search over custom text data.
  • Uses Pinecone for efficient vector storage and retrieval.
  • Provides a conversational interface powered by GPT3.

Maintenance & Community

The project is a personal starter kit by the author, David Dabit. Further community engagement or maintenance status is not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as a "basic starter project" and may require significant modification for production use. The index initialization process includes a setTimeout that might fail if index creation exceeds 3 minutes, requiring manual monitoring and re-runs. The default data is specific to the Lens protocol, necessitating replacement for other use cases.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX) and Andrew Kane Andrew Kane(Author of pgvector).

chatgpt-pgvector by gannonh

0%
938
Domain-specific chat completions app
Created 2 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Simon Willison Simon Willison(Coauthor of Django).

semantra by freedmand

0.1%
3k
CLI tool for semantic document search
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.