x-crawl  by coder-hxl

Node.js library for AI-assisted web crawling

created 2 years ago
1,741 stars

Top 25.1% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides a flexible Node.js web crawler with AI-assisted capabilities, targeting developers who need to efficiently extract data from dynamic or static web pages, APIs, and files. It simplifies complex crawling tasks by integrating with OpenAI and Ollama, allowing for semantic understanding of web content and resilience against website structure changes.

How It Works

x-crawl leverages a headless browser (likely Puppeteer or Playwright) for dynamic page rendering and interaction. Its core innovation lies in its AI integration, allowing users to pass HTML content or specific elements to OpenAI or Ollama models for intelligent data extraction, summarization, or transformation. This approach bypasses the need for brittle CSS selectors or XPath, making crawlers more robust against website updates.

Quick Start & Requirements

Highlighted Details

  • AI assistance for parsing HTML and extracting specific data points.
  • Supports dynamic pages, static pages, API data, and file downloads.
  • Features include automated operations (keyboard input, events), device fingerprinting, asynchronous/synchronous modes, interval crawling, and proxy rotation.
  • Built with TypeScript, offering complete type definitions.

Maintenance & Community

  • Active development by CoderHXL.
  • Community support available via Discord.
  • Issues and suggestions can be raised on GitHub Issues.

Licensing & Compatibility

  • MIT License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The library is intended for legal use only, and users must comply with robots.txt regulations. The AI-assisted features can be token-intensive and may incur costs if using services like OpenAI.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
2
Star History
50 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
7 more.

firecrawl by mendableai

1.9%
44k
API service for turning websites into LLM-ready data
created 1 year ago
updated 1 day ago
Feedback? Help us improve.