ohmycaptcha  by shenhao-stu

Self-hostable captcha solver with multimodal AI capabilities

Created 1 month ago
634 stars

Top 52.1% on SourcePulse

GitHubView on GitHub
Project Summary

OhMyCaptcha provides a self-hostable, YesCaptcha-compatible service for solving a wide array of CAPTCHAs. It is designed for developers integrating automated systems with services that require CAPTCHA resolution, offering a flexible solution that combines browser automation with advanced AI models. The primary benefit is a unified, self-managed API endpoint that mimics popular commercial CAPTCHA-solving services.

How It Works

The project employs a hybrid architecture. For complex web-based challenges like reCAPTCHA, hCaptcha, and Cloudflare Turnstile, it utilizes Playwright with Chromium for browser automation. For image-based CAPTCHAs, it integrates local or cloud-hosted OpenAI-compatible multimodal models (e.g., Qwen3.5-2B via SGLang) for recognition and classification. This dual approach allows OhMyCaptcha to handle both dynamic web interactions and visual puzzle-solving efficiently, exposing all capabilities through a standard YesCaptcha-style API.

Quick Start & Requirements

  • Primary install: Set up a Python virtual environment, activate it, and run pip install -r requirements.txt.
  • Non-default prerequisites:
    • Install browser dependencies: playwright install --with-deps chromium.
    • Configure model backend via environment variables: LOCAL_BASE_URL and LOCAL_MODEL for local inference, or CLOUD_BASE_URL, CLOUD_API_KEY, and CLOUD_MODEL for cloud endpoints.
    • A CLIENT_KEY is required for authentication.
  • Run command: python main.py.
  • Links: Documentation and Render/Hugging Face deployment guides are mentioned.

Highlighted Details

  • Supports 19 distinct task types, including various reCAPTCHA (v2/v3/Enterprise), hCaptcha, and Cloudflare Turnstile challenges.
  • Offers image recognition and classification capabilities using local or cloud multimodal vision models.
  • Fully implements the YesCaptcha createTask/getTaskResult/getBalance API protocol for easy integration.
  • Leverages Playwright for robust browser automation to solve dynamic web-based CAPTCHAs.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps are provided in the README.

Licensing & Compatibility

The project is licensed under the MIT license, permitting free use, modification, and deployment. However, a comprehensive disclaimer emphasizes that users are solely responsible for ensuring their deployment and usage comply with all applicable laws and third-party terms of service.

Limitations & Caveats

Tasks are managed in memory with a 10-minute Time-To-Live (TTL). The minScore parameter is accepted for compatibility but not actively enforced. The success rate of browser-based solving is dependent on environmental factors, IP reputation, and the target website's behavior. Image classification accuracy is contingent on the chosen vision model's performance. Not all features found in commercial CAPTCHA-solving services are replicated.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
4
Star History
291 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory).

AstrBot by AstrBotDevs

2.2%
30k
LLM chatbot/framework for multiple platforms
Created 3 years ago
Updated 4 hours ago
Feedback? Help us improve.