cnn_for_captcha by anexplore

Image captcha solver (digits, text, rotation, object similarity)

created 4 years ago

282 stars

Top 93.5% on sourcepulse

Project Summary

This repository provides a collection of deep learning-based solutions for various image CAPTCHA recognition tasks, targeting developers and researchers aiming to automate or analyze CAPTCHA systems. It offers practical implementations for fixed-length text, sliding puzzles, point-and-click text, rotation, and similar object CAPTCHAs, aiming to bypass or understand these security measures.

How It Works

The project leverages several deep learning techniques tailored to specific CAPTCHA types. For fixed-length text CAPTCHAs, it uses CNNs trained on custom datasets with specific naming conventions. Sliding CAPTCHAs are addressed using either OpenCV's template matching for simplicity or YOLOv5 for more robust object detection of the slider gap. Point-and-click CAPTCHAs involve object detection (YOLOv5) to locate candidate characters, followed by matching strategies that may include OCR or Siamese networks for image-based character comparison. Rotation CAPTCHAs are tackled via regression (predicting rotation angle) or classification, often using ResNet50 for feature extraction. Similar object CAPTCHAs utilize YOLOv5 for object detection and classification.

Quick Start & Requirements

Installation: pip install -r requirements.txt (comprehensive list, install as needed).
Prerequisites: Python, OpenCV, YOLOv5, potentially PaddleOCR, Tesseract, cnocr, PyTorch. GPU with CUDA is highly recommended for training and efficient inference.
Setup: Requires preparing labeled datasets for training specific CAPTCHA types. Training times vary based on dataset size and hardware.
Resources: Links to training scripts, predictor classes, and data preparation utilities are provided within the README.

Highlighted Details

Offers implementations for fixed-length text, sliding, point-click, rotation, and similar object CAPTCHAs.
Utilizes YOLOv5 for various detection tasks, including sliding puzzle gaps and object identification.
Explores both regression and classification approaches for rotation CAPTCHAs.
Includes a section on leveraging large multimodal models (Gemini, GPT-4o, Claude 3.5) for CAPTCHA recognition, noting their current limitations and potential.
Provides utility scripts for data splitting and format conversion (e.g., labelme_json_to_yolov5_format.py).

Maintenance & Community

The repository is maintained by anexplore. No specific community channels (Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The presence of requirements.txt suggests compatibility with standard Python environments. Commercial use would require clarification on licensing.

Limitations & Caveats

Performance is heavily dependent on the size and quality of the training dataset for each CAPTCHA type. Some methods, like OCR for deformed text, may not be effective. The project notes that large models, while promising, currently have slower inference speeds and higher resource requirements, and may require fine-tuning for optimal performance.

Health Check

Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 90 days