is-my-problem-new  by fjzzq2002

Semantic search engine for competitive programming problems

created 1 year ago
280 stars

Top 93.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a semantic search engine for competitive programming problems, enabling users to find similar problems based on natural language descriptions. It simplifies problem statements using LLMs, generates embeddings, and performs vector searches, making it useful for competitive programmers seeking to discover new problems or identify duplicates.

How It Works

The core approach involves using a Large Language Model (LLM) to simplify and paraphrase competitive programming problem statements, removing extraneous background information. These simplified texts are then embedded into vector representations. When a user queries the system, their query is also embedded, and a vector search is performed against the problem embeddings to find semantically similar problems. This method leverages recent advancements in LLM capabilities and affordability for effective document retrieval.

Quick Start & Requirements

  • Install dependencies via pip install -r requirements.txt.
  • Requires API keys for OpenAI, Together, and Voyage.
  • Problems should be placed in the problems/ directory in json format (e.g., problems/1000.json).
  • Run python -m src.build_summary, python -m src.build_embedding, python -m src.build_locale, and finally python -m src.ui to start the server.
  • Decent CPUs are recommended for vector searching.
  • Official site: http://yuantiji.ac

Highlighted Details

  • Utilizes Gemma 2 9B and voyage-large-2-instruct for embedding.
  • Supports data sourcing from vjudge and AtCoder.
  • Processing ~160k problems cost approximately $60.

Maintenance & Community

  • Project is actively maintained with recent updates in July 2024.
  • Contributions from users like @fstqwq are acknowledged.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

The project does not provide scraped vjudge problems or a vjudge scraper due to copyright concerns, and it does not process PDF statements. Users must acquire their own data or contribute it.

Health Check
Last commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
2
Star History
20 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.