Discover and explore top open-source AI tools and projects—updated daily.
BrowserCashHigh-performance web crawler API for LLM data extraction
Top 96.8% on SourcePulse
Teracrawl is a high-performance, production-ready API designed to convert web content into clean, LLM-ready Markdown. It addresses the challenges of JavaScript rendering, anti-bot measures, and complex HTML structures, making real-time data accessible to AI systems. The tool is ideal for engineers and researchers building applications that require robust web scraping and data extraction for natural language processing tasks.
How It Works
Teracrawl leverages managed remote Chrome browsers, powered by Browser.cash, to ensure high success rates even on protected websites. Its core innovation lies in a "Smart Two-Phase Crawling" approach: a Fast Mode optimized for static/SSR pages and a Dynamic Mode that automatically falls back for complex Single Page Applications (SPAs) by waiting for rendering. It also offers a combined "Search + Scrape" endpoint to query Google and scrape top results in parallel, converting raw HTML into semantic Markdown suitable for Retrieval Augmented Generation (RAG) and LLM context windows.
Quick Start & Requirements
git clone https://github.com/BrowserCash/teracrawl.git), navigate into the directory (cd teracrawl), and install dependencies (npm install).browser-serp on port 8080 is necessary for the /crawl endpoint.npm run dev for development or npm run build followed by npm start for production. The server typically runs at http://0.0.0.0:8085.Highlighted Details
/crawl endpoint for Google search and scraping top results, and a /scrape endpoint for direct URL conversion.Maintenance & Community
The project welcomes contributions via pull requests. Specific community channels (like Discord/Slack) or notable maintainers/sponsors are not detailed in the README.
Licensing & Compatibility
This project is licensed under the MIT License, which is permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The search functionality (/crawl endpoint) is dependent on a separately running browser-serp service. Users must obtain and configure a Browser.cash API key.
5 months ago
Inactive
apify
ScrapeGraphAI