ai-crawler-py  by oxylabs

AI web crawler app for prompt-guided data extraction

Created 1 month ago
726 stars

Top 47.5% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Oxylabs AI-Crawler is an experimental Python tool that simplifies web data extraction by using natural language prompts to guide crawling and data retrieval. It targets developers and data scientists, enabling them to focus on data analysis rather than building and maintaining complex web scrapers. The primary benefit is an AI-driven, low-code approach to acquiring structured data from websites.

How It Works

The AI-Crawler initiates crawls from a specified URL, intelligently identifying relevant pages based on a user's natural language prompt. It employs AI algorithms for URL selection and content extraction. For JSON output, users can define a schema in natural language, which the crawler uses to structure the extracted data, or opt for automatic schema generation. This approach dynamically adapts to website content, reducing the need for brittle, static selectors.

Quick Start & Requirements

  • Installation: pip install oxylabs-ai-studio
  • Prerequisites: Python 3.10+ and an Oxylabs API key (a free trial with 1,000 credits is available).
  • Documentation: Oxylabs AI Studio Python SDK

Highlighted Details

  • Natural Language Prompting: Define data extraction goals in plain English.
  • AI-Assisted URL Selection: Intelligently prioritizes pages relevant to the prompt.
  • Flexible Output: Supports structured JSON (with schema-based parsing) and Markdown formats.
  • Schema Generation: Automatically creates parsing schemas from natural language prompts.
  • JavaScript Rendering: Option to enable JavaScript rendering for dynamic content.
  • Geo-Targeting: Proxy location can be specified.

Maintenance & Community

  • Support: Available via email (hello@oxylabs.io) or live chat.
  • No explicit community forums (e.g., Discord, Slack) are mentioned in the README.

Licensing & Compatibility

  • License: Not explicitly stated in the provided README.
  • Compatibility: Designed for Python 3.10+.

Limitations & Caveats

The tool is described as "experimental." It requires an Oxylabs API key, with usage subject to a credit system after a free trial. Crawlability is limited to publicly accessible websites, and users must ensure compliance with website terms of service and local laws.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
645 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
15 more.

stagehand by browserbase

0.6%
19k
AI browser automation framework for production
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.