ai-crawler-py  by oxylabs

AI web crawler app for prompt-guided data extraction

Created 3 months ago
2,440 stars

Top 18.6% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Oxylabs AI-Crawler is an experimental Python tool that simplifies web data extraction by using natural language prompts to guide crawling and data retrieval. It targets developers and data scientists, enabling them to focus on data analysis rather than building and maintaining complex web scrapers. The primary benefit is an AI-driven, low-code approach to acquiring structured data from websites.

How It Works

The AI-Crawler initiates crawls from a specified URL, intelligently identifying relevant pages based on a user's natural language prompt. It employs AI algorithms for URL selection and content extraction. For JSON output, users can define a schema in natural language, which the crawler uses to structure the extracted data, or opt for automatic schema generation. This approach dynamically adapts to website content, reducing the need for brittle, static selectors.

Quick Start & Requirements

  • Installation: pip install oxylabs-ai-studio
  • Prerequisites: Python 3.10+ and an Oxylabs API key (a free trial with 1,000 credits is available).
  • Documentation: Oxylabs AI Studio Python SDK

Highlighted Details

  • Natural Language Prompting: Define data extraction goals in plain English.
  • AI-Assisted URL Selection: Intelligently prioritizes pages relevant to the prompt.
  • Flexible Output: Supports structured JSON (with schema-based parsing) and Markdown formats.
  • Schema Generation: Automatically creates parsing schemas from natural language prompts.
  • JavaScript Rendering: Option to enable JavaScript rendering for dynamic content.
  • Geo-Targeting: Proxy location can be specified.

Maintenance & Community

  • Support: Available via email (hello@oxylabs.io) or live chat.
  • No explicit community forums (e.g., Discord, Slack) are mentioned in the README.

Licensing & Compatibility

  • License: Not explicitly stated in the provided README.
  • Compatibility: Designed for Python 3.10+.

Limitations & Caveats

The tool is described as "experimental." It requires an Oxylabs API key, with usage subject to a credit system after a free trial. Crawlability is limited to publicly accessible websites, and users must ensure compliance with website terms of service and local laws.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
232 stars in the last 30 days

Explore Similar Projects

Starred by Will Brown Will Brown(Research Lead at Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

stagehand by browserbase

1.5%
20k
AI browser automation framework for production
Created 1 year ago
Updated 2 days ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Dirk Englund Dirk Englund(MIT EECS Professor and Cofounder of Axiomatic AI), and
25 more.

firecrawl by firecrawl

1.8%
74k
API service for turning websites into LLM-ready data
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.