HeadlessX  by saifyxpro

Self-hosted browser automation for undetected scraping and AI workflows

Created 6 months ago
1,811 stars

Top 23.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

HeadlessX is a self-hosted, open-source browser automation platform designed for scalable and private web scraping and search workflows. It targets developers and power users needing robust data extraction capabilities with a focus on avoiding detection. The platform offers a significant benefit through its use of Camoufox, a Firefox-based engine, promising high undetectability rates, coupled with a comprehensive feature set for managing complex scraping tasks.

How It Works

The core architecture is built on Node.js and Next.js, employing a queue-backed workflow system powered by Redis for asynchronous job processing. A key differentiator is its integration with Camoufox, a custom Firefox engine engineered for anti-detection, ensuring minimal footprint during scraping operations. HeadlessX provides a user-friendly web dashboard for configuration and monitoring, a secure API for programmatic control, and a remote MCP endpoint for external integration, facilitating seamless automation pipelines.

Quick Start & Requirements

  • Primary Install: Clone the repository, install dependencies with pnpm install, and run with pnpm dev. Docker Compose (infra/docker/docker-compose.yml) is also supported for a full stack setup.
  • Prerequisites: Node.js 22+, pnpm 9+, PostgreSQL, Redis, Python/uv (for yt-engine), Go (for HTML-to-Markdown sidecar).
  • Links: Setup Guide, API Reference.
  • Resource Footprint: Local development requires running multiple services (API, web, worker, engines); Docker Compose simplifies this.

Highlighted Details

  • Leverages Camoufox (Firefox) for advanced anti-detection capabilities.
  • Features a web dashboard, protected API, and queue-backed job flows with Redis.
  • Integrates with search providers: Google AI Search, Tavily, Exa, and YouTube.
  • Provides a remote MCP endpoint secured via dashboard-generated API keys.
  • Includes an installable CLI skill for AI coding agents.
  • Extensive roadmap of planned scrapers for platforms like Google Maps, X, LinkedIn, Amazon, and more.

Maintenance & Community

No specific details regarding maintainers, sponsorships, or community channels (e.g., Discord, Slack) are present in the provided README.

Licensing & Compatibility

Licensed under the MIT license, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Queue-backed features are degraded if Redis is not configured. Several advanced scrapers and the next-generation headfox browser engine are still in the "Planned" development stage. The yt-engine and HTML-to-Markdown services require separate Python and Go runtimes, respectively.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
12
Issues (30d)
4
Star History
156 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Gregor Zunic Gregor Zunic(Cofounder of Browser Use), and
1 more.

suna by kortix-ai

0.1%
20k
Open-source AI agent for real-world task automation
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.