x-crawl by coder-hxl

Node.js library for AI-assisted web crawling

Created 3 years ago

1,765 stars

Top 24.1% on SourcePulse

Project Summary

This library provides a flexible Node.js web crawler with AI-assisted capabilities, targeting developers who need to efficiently extract data from dynamic or static web pages, APIs, and files. It simplifies complex crawling tasks by integrating with OpenAI and Ollama, allowing for semantic understanding of web content and resilience against website structure changes.

How It Works

x-crawl leverages a headless browser (likely Puppeteer or Playwright) for dynamic page rendering and interaction. Its core innovation lies in its AI integration, allowing users to pass HTML content or specific elements to OpenAI or Ollama models for intelligent data extraction, summarization, or transformation. This approach bypasses the need for brittle CSS selectors or XPath, making crawlers more robust against website updates.

Quick Start & Requirements

Install via npm: npm install x-crawl
Requires Node.js.
AI features require API keys for OpenAI or a running Ollama instance.
Documentation: English, 简体中文, V9 English, V9 简体中文

Highlighted Details

AI assistance for parsing HTML and extracting specific data points.
Supports dynamic pages, static pages, API data, and file downloads.
Features include automated operations (keyboard input, events), device fingerprinting, asynchronous/synchronous modes, interval crawling, and proxy rotation.
Built with TypeScript, offering complete type definitions.

Maintenance & Community

Active development by CoderHXL.
Community support available via Discord.
Issues and suggestions can be raised on GitHub Issues.

Licensing & Compatibility

MIT License.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The library is intended for legal use only, and users must comply with robots.txt regulations. The AI-assisted features can be token-intensive and may incur costs if using services like OpenAI.

x-crawl by coder-hxl

Explore Similar Projects

oxylabs-ai-studio-py by oxylabs

scrape-it-now by clemlesne

mcp-omnisearch by spences10

Crawling-Infrastructure by NikolaiT

doctor by sisig-ai

WaterCrawl by watercrawl

tavily-mcp by tavily-ai

CyberScraper-2077 by itsOwen

AI-Web-Scraper by techwithtim

crawlee by apify

Scrapegraph-ai by ScrapeGraphAI

suna by kortix-ai