Discover and explore top open-source AI tools and projects—updated daily.
firecrawlCLI tool for LLM training/inference text file generation
Top 64.0% on SourcePulse
This project provides a tool to generate consolidated text files from websites, specifically designed for Large Language Model (LLM) training and inference. It targets developers and researchers needing to process web content efficiently, offering a streamlined way to prepare data for LLM applications.
How It Works
The system leverages FireCrawl for web crawling to extract content from specified URLs. It then utilizes GPT-4-mini for text processing, consolidating the extracted information into a single text file. Two output formats are generated: a standard llms.txt and a more comprehensive llms-full.txt.
Quick Start & Requirements
llmstxt.firecrawl.dev for browser-based generation.GET https://llmstxt.firecrawl.dev/[YOUR_URL_HERE]npm install and npm run dev..env file with FIRECRAWL_API_KEY, SUPABASE_URL, SUPABASE_KEY, and OPENAI_API_KEY is necessary for local setup.Highlighted Details
llms.txt and llms-full.txt output formats.Maintenance & Community
The project is associated with @firecrawl_dev. Further community or maintenance details are not provided in the README.
Licensing & Compatibility
The licensing information is not specified in the README.
Limitations & Caveats
Processing times can be several minutes due to crawling and LLM operations. Local development requires specific API keys for FireCrawl, Supabase, and OpenAI.
4 months ago
Inactive
meta-llama
adbar