CLI tool for LLM training/inference text file generation
Top 68.9% on sourcepulse
This project provides a tool to generate consolidated text files from websites, specifically designed for Large Language Model (LLM) training and inference. It targets developers and researchers needing to process web content efficiently, offering a streamlined way to prepare data for LLM applications.
How It Works
The system leverages FireCrawl for web crawling to extract content from specified URLs. It then utilizes GPT-4-mini for text processing, consolidating the extracted information into a single text file. Two output formats are generated: a standard llms.txt
and a more comprehensive llms-full.txt
.
Quick Start & Requirements
llmstxt.firecrawl.dev
for browser-based generation.GET https://llmstxt.firecrawl.dev/[YOUR_URL_HERE]
npm install
and npm run dev
..env
file with FIRECRAWL_API_KEY
, SUPABASE_URL
, SUPABASE_KEY
, and OPENAI_API_KEY
is necessary for local setup.Highlighted Details
llms.txt
and llms-full.txt
output formats.Maintenance & Community
The project is associated with @firecrawl_dev
. Further community or maintenance details are not provided in the README.
Licensing & Compatibility
The licensing information is not specified in the README.
Limitations & Caveats
Processing times can be several minutes due to crawling and LLM operations. Local development requires specific API keys for FireCrawl, Supabase, and OpenAI.
1 month ago
Inactive