insane-search  by fivetaku

Adaptive web scraper for resilient data extraction

Created 3 weeks ago

New!

572 stars

Top 56.2% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> insane-search is a Claude Code plugin designed to bypass common web blocking mechanisms like WAFs, CAPTCHAs, and login walls, enabling users to access content from stubborn websites. It targets users within the Claude Code ecosystem who frequently encounter access restrictions, offering a seamless, no-configuration solution that retrieves data where other tools fail. The primary benefit is reliable web content access without requiring API keys or complex setup.

How It Works

<2-4 sentences on core approach / design (key algorithms, models, data flow, or architectural choices) and why this approach is advantageous or novel.> The core of insane-search is a 5-phase adaptive scheduler that intelligently escalates probing techniques. It begins with lightweight methods and progresses through TLS fingerprint impersonation with sophisticated identity spoofing (including cookie warming and referrer chains) to a full browser environment using Playwright. This multi-phase approach allows it to discover hidden APIs by monitoring network traffic and adapt to site-specific challenges. Its novelty lies in its "never give up" philosophy: it auto-installs missing dependencies like curl_cffi and yt-dlp transparently, and it doesn't pre-judge sites as inaccessible, ensuring maximum retrieval success.

Quick Start & Requirements

  • Primary install / run command (pip, Docker, binary, etc.). Installation is integrated within Claude Code: add the plugin marketplace (/plugin marketplace add https://github.com/fivetaku/gptaku_plugins.git), install insane-search (/plugin install insane-search), and restart Claude Code.
  • Non-default prerequisites and dependencies (GPU, CUDA >= 12, Python 3.12, large dataset, API keys, OS, hardware, etc.). Requires the Claude Code environment. Dependencies such as curl_cffi, feedparser, and yt-dlp are auto-installed on demand. Optional enhancements include gh CLI and Playwright.
  • Estimated setup time or resource footprint. Zero config, no API keys, no environment variables needed. Dependencies are installed transparently on first use.
  • If they are present, include links to official quick-start, docs, demo, or other relevant pages. The README serves as the primary documentation.

Highlighted Details

  • 5-Phase Adaptive Scheduler: Escalates from basic probes to advanced TLS impersonation and full browser analysis.
  • Identity Spoofing: Employs detailed browser identity replication, including TLS fingerprints, cookies, and locale-matched headers.
  • Automatic Dependency Management: Seamlessly installs required libraries (e.g., curl_cffi, yt-dlp) when needed.
  • Hidden API Discovery: Utilizes Playwright to capture and reuse backend JSON APIs for data extraction.
  • Extensive Platform Support: Covers numerous sites like X, Reddit, YouTube, Naver, Coupang, LinkedIn, arXiv, and GitHub through specific APIs or adaptive probing.

Maintenance & Community

The provided README does not detail specific contributors, sponsorships, or community channels such as Discord or Slack.

Licensing & Compatibility

  • License type and notable restrictions (GPL -> copyleft, SSPL, etc.). The project is released under the MIT license.
  • Compatibility notes for commercial use or closed-source linking. This license is highly permissive and allows for commercial use and integration into closed-source applications without significant restrictions.

Limitations & Caveats

The tool's effectiveness is limited by sites requiring explicit authentication ("authentication required"). It functions as a method-selection layer rather than a traditional scraper, relying on the Claude Code environment for execution.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
573 stars in the last 21 days

Explore Similar Projects

Feedback? Help us improve.