Open-source web crawler for AI tool detail extraction
Top 94.1% on sourcepulse
This project provides a web crawler that extracts website information, generates screenshots, and uses LLMs to summarize content and create SEO-friendly descriptions. It's designed for individual developers managing AI tool directories and learners interested in Python web scraping and AI integration.
How It Works
The crawler leverages Python for lightweight operation. It fetches titles, descriptions, and introductions from specified URLs. Key functionality includes generating web page screenshots and utilizing LLMs (like Llama 3 or ChatGPT via Groq) to process website introductions and produce summarized, SEO-optimized Markdown descriptions.
Quick Start & Requirements
git clone https://github.com/6677-ai/tap4-ai-crawler.git
) and install dependencies (pip install -r requirements.txt
)..env
and run python main_api.py
.curl
to send POST requests with JSON payloads containing the URL and optional tags.Highlighted Details
Maintenance & Community
The project is associated with tap4.ai. Contact information includes a Twitter handle (https://x.com/tap4ai) and WeChat contact for inquiries.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README.
Limitations & Caveats
Crawling may fail due to anti-scraping measures, requiring manual checks. LLM output may not always meet expectations and might require prompt optimization or manual review due to anti-scraping. Web scraping requires specific server configurations; paid services like Zeabur with U.S. nodes are recommended for optimal performance.
2 months ago
Inactive