wikipedia-semantic-search  by upstash

Semantic search engine and RAG chatbot using Wikipedia data

Created 1 year ago
472 stars

Top 64.7% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a semantic search engine and RAG chatbot built on Wikipedia data, targeting developers and researchers interested in vector databases and RAG applications. It demonstrates indexing millions of Wikipedia articles for efficient, cross-lingual semantic search and conversational AI.

How It Works

The system leverages Upstash Vector for storing and querying millions of vector embeddings generated from Wikipedia articles. It utilizes the BGE-M3 embedding model, enabling multilingual semantic search capabilities. A RAG chatbot is implemented using the Upstash RAG Chat SDK, with chat sessions persisted in Upstash Redis and LLM interactions managed via QStash LLM APIs, powered by Meta-Llama-3-8B-Instruct.

Quick Start & Requirements

  • Install dependencies: pnpm install
  • Run development server: pnpm dev
  • Prerequisites: Upstash Vector database (with BGE-M3 model), Upstash Redis database, QStash credentials.
  • Configuration: Requires a .env file with UPSTASH_VECTOR_REST_URL, UPSTASH_VECTOR_REST_TOKEN, UPSTASH_REDIS_REST_TOKEN, UPSTASH_REDIS_REST_URL, and QSTASH_TOKEN.
  • Data Indexing: Vectors must be upserted into appropriate namespaces (e.g., en for English).
  • Live Demo: https://wikipedia-semantic-search.upstash.dev/

Highlighted Details

  • Indexed over 144 million vectors from Wikipedia articles across 11 languages.
  • Utilizes BGE-M3 embedding model for robust multilingual support.
  • Implements semantic search with cross-lingual querying capabilities.
  • Features a RAG chatbot powered by Upstash RAG Chat SDK and Meta-Llama-3-8B-Instruct.

Maintenance & Community

The project is maintained by Upstash. Contributions are welcome via issues and pull requests. Further contact information can be found in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification on the licensing terms.

Limitations & Caveats

The project relies heavily on Upstash services, potentially creating vendor lock-in. The setup requires obtaining and configuring credentials for multiple Upstash services. The README does not detail performance benchmarks or specific hardware requirements beyond the need for Upstash service access.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Simon Horup Eskildsen Simon Horup Eskildsen(Cofounder of Turbopuffer), and
21 more.

meilisearch by meilisearch

0.2%
53k
Search engine API for integrating AI-powered hybrid search
Created 7 years ago
Updated 1 day ago
Feedback? Help us improve.