cli  by firecrawl

Web scraping and AI agent capabilities via CLI

Created 3 months ago
259 stars

Top 97.7% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Fircrawl CLI is a command-line interface tool designed to add powerful web scraping, crawling, and AI agent integration capabilities directly to a developer's terminal. It targets AI engineers, researchers, and power users who need to programmatically extract, process, and leverage web data for applications, agent development, or automated workflows. The primary benefit is enabling efficient, structured data extraction from the web and seamlessly integrating these capabilities into AI agent toolkits.

How It Works

The tool operates as a Node.js-based CLI, offering commands for scraping individual URLs, searching the web, mapping website structures, and crawling entire sites. It supports various output formats, including clean markdown, JSON, raw HTML, extracted links, images, and AI-generated summaries. For AI integration, it provides commands to set up "skills" and an "MCP server" for agents like Claude Code, and offers experimental AI workflows that combine web data extraction with agent execution for tasks like competitor analysis or deep research. Authentication can be handled via browser-based login or direct API key input, with support for self-hosted instances using a custom API URL.

Quick Start & Requirements

  • Installation: npm install -g firecrawl-cli
  • Prerequisites: Node.js and npm/pnpm. Browser required for initial authentication if not using API key directly.
  • Setup: npx -y firecrawl-cli@latest init -y --browser -y for non-interactive setup including browser authentication and skill installation.
  • Documentation: Firecrawl Documentation

Highlighted Details

  • Comprehensive scraping options: Extract main content only, wait for JavaScript rendering, specify included/excluded HTML tags, take screenshots, and cache content.
  • Advanced web search: Filter results by source (web, news, images), category (GitHub, research papers, PDFs), time, and location.
  • Bulk site download: The download command maps a site and scrapes each page into a local directory structure, supporting various formats and filters.
  • AI Agent Integration: Seamlessly add scraping and browsing capabilities to AI agents via "skills" and "MCP" server setup, alongside experimental AI workflows for complex research tasks.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or project health signals were found in the provided README.

Licensing & Compatibility

The provided README does not specify a software license. This lack of explicit licensing information presents a significant adoption blocker, as it leaves the terms of use, modification, and distribution unclear, particularly for commercial applications.

Limitations & Caveats

The browser command is deprecated in favor of scrape and interact. The AI Workflows are explicitly marked as experimental. By default, the CLI collects anonymous usage telemetry (version, OS, Node.js version, detected development tools) unless disabled via the FIRECRAWL_NO_TELEMETRY=1 environment variable.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
19
Issues (30d)
1
Star History
123 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.