ohmycaptcha by shenhao-stu

Self-hostable captcha solver with multimodal AI capabilities

Created 2 months ago

745 stars

Top 46.1% on SourcePulse

Project Summary

OhMyCaptcha provides a self-hostable, YesCaptcha-compatible service for solving a wide array of CAPTCHAs. It is designed for developers integrating automated systems with services that require CAPTCHA resolution, offering a flexible solution that combines browser automation with advanced AI models. The primary benefit is a unified, self-managed API endpoint that mimics popular commercial CAPTCHA-solving services.

How It Works

The project employs a hybrid architecture. For complex web-based challenges like reCAPTCHA, hCaptcha, and Cloudflare Turnstile, it utilizes Playwright with Chromium for browser automation. For image-based CAPTCHAs, it integrates local or cloud-hosted OpenAI-compatible multimodal models (e.g., Qwen3.5-2B via SGLang) for recognition and classification. This dual approach allows OhMyCaptcha to handle both dynamic web interactions and visual puzzle-solving efficiently, exposing all capabilities through a standard YesCaptcha-style API.

Quick Start & Requirements

Primary install: Set up a Python virtual environment, activate it, and run pip install -r requirements.txt.
Non-default prerequisites:
- Install browser dependencies: playwright install --with-deps chromium.
- Configure model backend via environment variables: LOCAL_BASE_URL and LOCAL_MODEL for local inference, or CLOUD_BASE_URL, CLOUD_API_KEY, and CLOUD_MODEL for cloud endpoints.
- A CLIENT_KEY is required for authentication.
Run command: python main.py.
Links: Documentation and Render/Hugging Face deployment guides are mentioned.

Highlighted Details

Supports 19 distinct task types, including various reCAPTCHA (v2/v3/Enterprise), hCaptcha, and Cloudflare Turnstile challenges.
Offers image recognition and classification capabilities using local or cloud multimodal vision models.
Fully implements the YesCaptcha createTask/getTaskResult/getBalance API protocol for easy integration.
Leverages Playwright for robust browser automation to solve dynamic web-based CAPTCHAs.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps are provided in the README.

Licensing & Compatibility

The project is licensed under the MIT license, permitting free use, modification, and deployment. However, a comprehensive disclaimer emphasizes that users are solely responsible for ensuring their deployment and usage comply with all applicable laws and third-party terms of service.

Limitations & Caveats

Tasks are managed in memory with a 10-minute Time-To-Live (TTL). The minScore parameter is accepted for compatibility but not actively enforced. The success rate of browser-based solving is dependent on environmental factors, IP reputation, and the target website's behavior. Image classification accuracy is contingent on the chosen vision model's performance. Not all features found in commercial CAPTCHA-solving services are replicated.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

31 stars in the last 30 days