llmstxt-generator by firecrawl

CLI tool for LLM training/inference text file generation

Created 1 year ago

506 stars

Top 61.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Nicolas Camara

Cofounder of Firecrawl

Project Summary

This project provides a tool to generate consolidated text files from websites, specifically designed for Large Language Model (LLM) training and inference. It targets developers and researchers needing to process web content efficiently, offering a streamlined way to prepare data for LLM applications.

How It Works

The system leverages FireCrawl for web crawling to extract content from specified URLs. It then utilizes GPT-4-mini for text processing, consolidating the extracted information into a single text file. Two output formats are generated: a standard llms.txt and a more comprehensive llms-full.txt.

Quick Start & Requirements

Web Interface: Visit llmstxt.firecrawl.dev for browser-based generation.
API Endpoint: GET https://llmstxt.firecrawl.dev/[YOUR_URL_HERE]
Local Development: Requires npm install and npm run dev.
Prerequisites: A .env file with FIRECRAWL_API_KEY, SUPABASE_URL, SUPABASE_KEY, and OPENAI_API_KEY is necessary for local setup.

Highlighted Details

Powered by FireCrawl for web crawling and GPT-4-mini for text processing.
Generates both llms.txt and llms-full.txt output formats.
Offers both a web interface and an API endpoint.
No API key required for basic usage via the web interface.

Maintenance & Community

The project is associated with @firecrawl_dev. Further community or maintenance details are not provided in the README.

Licensing & Compatibility

The licensing information is not specified in the README.

Limitations & Caveats

Processing times can be several minutes due to crawling and LLM operations. Local development requires specific API keys for FireCrawl, Supabase, and OpenAI.

Health Check

Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

19 stars in the last 30 days