llm-codes  by amantus-ai

Convert complex developer docs into AI-friendly Markdown

Created 8 months ago
273 stars

Top 94.7% on SourcePulse

GitHubView on GitHub
Project Summary

This project addresses the challenge of AI agents struggling to parse modern, JavaScript-heavy developer documentation sites. It provides a high-performance web service that converts these dynamic sites into clean, LLM-optimized Markdown, making technical information accessible to AI agents. The target audience includes developers, researchers, and anyone building AI systems that need to comprehend technical documentation, offering a significant benefit in improving AI's ability to process and utilize complex documentation.

How It Works

The core approach leverages Firecrawl's headless browser to execute JavaScript and capture fully rendered HTML content. This dynamic content is then transformed into clean, semantic Markdown. The system intelligently removes extraneous elements like navigation, boilerplate, and duplicate content to conserve AI context tokens. Key architectural choices include parallel URL processing for efficiency, a Redis-backed caching layer for reduced API calls, and configurable content filtering strategies to tailor the output for AI consumption.

Quick Start & Requirements

  • Primary install/run command: Clone the repository, run npm install, create a .env.local file with your FIRECRAWL_API_KEY, and start the development server with npm run dev.
  • Prerequisites: Node.js 20+, npm or yarn, and a Firecrawl API key. Redis is recommended for production caching.
  • Deployment: Easily deployable to Vercel.
  • Demo: A live demo is available at llm.codes.

Highlighted Details

  • Parallel Processing: Fetches up to 20 URLs concurrently using batched promises.
  • Smart Caching: Utilizes a Redis-backed 30-day cache to minimize API calls and improve response times.
  • Content Filtering: Offers multiple strategies to remove navigation, boilerplate, platform availability strings (e.g., "iOS 14.0+"), and duplicate content.
  • Pattern-Based Site Support: Automatically supports many documentation sites through intelligent URL pattern matching (e.g., docs.*, developer.*, *.github.io) and explicit exceptions.

Maintenance & Community

Contributions are welcomed via pull requests. The project is hosted on GitHub. No specific community channels (like Discord or Slack) or notable contributors/sponsorships are detailed in the README.

Licensing & Compatibility

The project is licensed under the MIT License, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Browser notifications require user permission and may not function in all browsers or configurations. Without Redis caching, users might encounter API rate limits. While pattern matching covers many sites, some documentation structures may require explicit exceptions or community-reported issues for full support.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
69 stars in the last 30 days

Explore Similar Projects

Starred by Will Brown Will Brown(Research Lead at Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

stagehand by browserbase

0.4%
21k
AI browser automation framework for production
Created 1 year ago
Updated 22 hours ago
Feedback? Help us improve.