Semantic search engine for competitive programming problems
Top 93.9% on sourcepulse
This project provides a semantic search engine for competitive programming problems, enabling users to find similar problems based on natural language descriptions. It simplifies problem statements using LLMs, generates embeddings, and performs vector searches, making it useful for competitive programmers seeking to discover new problems or identify duplicates.
How It Works
The core approach involves using a Large Language Model (LLM) to simplify and paraphrase competitive programming problem statements, removing extraneous background information. These simplified texts are then embedded into vector representations. When a user queries the system, their query is also embedded, and a vector search is performed against the problem embeddings to find semantically similar problems. This method leverages recent advancements in LLM capabilities and affordability for effective document retrieval.
Quick Start & Requirements
pip install -r requirements.txt
.problems/
directory in json
format (e.g., problems/1000.json
).python -m src.build_summary
, python -m src.build_embedding
, python -m src.build_locale
, and finally python -m src.ui
to start the server.Highlighted Details
Maintenance & Community
@fstqwq
are acknowledged.Licensing & Compatibility
Limitations & Caveats
The project does not provide scraped vjudge problems or a vjudge scraper due to copyright concerns, and it does not process PDF statements. Users must acquire their own data or contribute it.
9 months ago
1 week