is-my-problem-new by fjzzq2002

Semantic search engine for competitive programming problems

Created 2 years ago

314 stars

Top 86.1% on SourcePulse

Project Summary

This project provides a semantic search engine for competitive programming problems, enabling users to find similar problems based on natural language descriptions. It simplifies problem statements using LLMs, generates embeddings, and performs vector searches, making it useful for competitive programmers seeking to discover new problems or identify duplicates.

How It Works

The core approach involves using a Large Language Model (LLM) to simplify and paraphrase competitive programming problem statements, removing extraneous background information. These simplified texts are then embedded into vector representations. When a user queries the system, their query is also embedded, and a vector search is performed against the problem embeddings to find semantically similar problems. This method leverages recent advancements in LLM capabilities and affordability for effective document retrieval.

Quick Start & Requirements

Install dependencies via pip install -r requirements.txt.
Requires API keys for OpenAI, Together, and Voyage.
Problems should be placed in the problems/ directory in json format (e.g., problems/1000.json).
Run python -m src.build_summary, python -m src.build_embedding, python -m src.build_locale, and finally python -m src.ui to start the server.
Decent CPUs are recommended for vector searching.
Official site: http://yuantiji.ac

Highlighted Details

Utilizes Gemma 2 9B and voyage-large-2-instruct for embedding.
Supports data sourcing from vjudge and AtCoder.
Processing ~160k problems cost approximately $60.

Maintenance & Community

Project is actively maintained with recent updates in July 2024.
Contributions from users like @fstqwq are acknowledged.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

The project does not provide scraped vjudge problems or a vjudge scraper due to copyright concerns, and it does not process PDF statements. Users must acquire their own data or contribute it.

is-my-problem-new by fjzzq2002

Explore Similar Projects

nixiesearch by nixiesearch

vectordb by epsilla-cloud

stark by snap-stanford

sqlite-vss by asg017

ai-powered-search by treygrainger

raglite by superlinear-ai

rag-from-scratch by pguso

sample-apps by vespa-engine

elasticsearch-labs by elastic

ColBERT by stanford-futuredata

lancedb by lancedb

qdrant by qdrant