deepcrawl  by lumpinif

Website data extraction platform for AI agents

Created 7 months ago
448 stars

Top 67.0% on SourcePulse

GitHubView on GitHub
Project Summary

A free, open-source edge platform for extracting website data context, Deepcrawl offers an alternative to Firecrawl for AI agents. It provides cleaned markdown, hierarchical link trees, and LLM-digestible metadata, aiming to reduce token costs, context switching, and hallucinations for agent applications. The platform is self-deployable on services like Cloudflare or Vercel, making it suitable for developers seeking control over their data extraction pipeline.

How It Works

Deepcrawl functions as an agents-oriented website data context extraction platform. Its core approach involves parsing web pages to generate cleaned markdown content, an agent-favored hierarchical links tree, and essential metadata. This structured output is optimized for consumption by Large Language Models (LLMs), minimizing token usage and improving the efficiency and accuracy of agent responses.

Quick Start & Requirements

  • Deployment: Self-deployable to Cloudflare or Vercel. Specific commands are not detailed in the provided text.
  • Prerequisites: Not explicitly listed, but deployment targets (Cloudflare/Vercel) imply familiarity with edge computing environments.
  • Documentation: Available at https://deepcrawl.dev/docs.

Highlighted Details

  • Positioned as a 100% free and open-source edge alternative to Firecrawl.
  • Features enhanced link extraction capabilities specifically for AI agents.
  • Aims for better performance and flexibility compared to alternatives.
  • The full platform includes a Next.js dashboard, API Workers, Auth Workers, and a database, all open and transparent.

Maintenance & Community

  • Developed by @felixLu.
  • No specific community channels (e.g., Discord, Slack) or roadmap links were provided in the README excerpt.

Licensing & Compatibility

  • License: Described as "Open Source. Open Code." The specific license type and its implications for commercial use or derivative works require further clarification.
  • Compatibility: Self-deployable nature suggests broad compatibility with supported edge platforms.

Limitations & Caveats

The project is explicitly marked with a warning: "DO NOT USE DEEPCRAWL IN PRODUCTION RIGHT NOW AS IT IS SUBJECT TO CHANGE AND STILL UNDER RAPID DEVELOPMENT. USE WITH YOUR OWN RISK!" This indicates potential instability and breaking changes.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
426 stars in the last 30 days

Explore Similar Projects

Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Zhen Lu Zhen Lu(Cofounder of Runpod), and
1 more.

agents-towards-production by NirDiamant

1.3%
17k
Production-ready GenAI agent tutorials
Created 6 months ago
Updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Gregor Zunic Gregor Zunic(Cofounder of Browser Use), and
1 more.

suna by kortix-ai

0.5%
19k
Open-source AI agent for real-world task automation
Created 1 year ago
Updated 19 hours ago
Feedback? Help us improve.