Semantic search engine and RAG chatbot using Wikipedia data
Top 65.7% on sourcepulse
This project provides a semantic search engine and RAG chatbot built on Wikipedia data, targeting developers and researchers interested in vector databases and RAG applications. It demonstrates indexing millions of Wikipedia articles for efficient, cross-lingual semantic search and conversational AI.
How It Works
The system leverages Upstash Vector for storing and querying millions of vector embeddings generated from Wikipedia articles. It utilizes the BGE-M3 embedding model, enabling multilingual semantic search capabilities. A RAG chatbot is implemented using the Upstash RAG Chat SDK, with chat sessions persisted in Upstash Redis and LLM interactions managed via QStash LLM APIs, powered by Meta-Llama-3-8B-Instruct.
Quick Start & Requirements
pnpm install
pnpm dev
.env
file with UPSTASH_VECTOR_REST_URL
, UPSTASH_VECTOR_REST_TOKEN
, UPSTASH_REDIS_REST_TOKEN
, UPSTASH_REDIS_REST_URL
, and QSTASH_TOKEN
.en
for English).Highlighted Details
Maintenance & Community
The project is maintained by Upstash. Contributions are welcome via issues and pull requests. Further contact information can be found in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification on the licensing terms.
Limitations & Caveats
The project relies heavily on Upstash services, potentially creating vendor lock-in. The setup requires obtaining and configuring credentials for multiple Upstash services. The README does not detail performance benchmarks or specific hardware requirements beyond the need for Upstash service access.
3 months ago
1 week