llmstxt-generator  by mendableai

CLI tool for LLM training/inference text file generation

created 8 months ago
442 stars

Top 68.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a tool to generate consolidated text files from websites, specifically designed for Large Language Model (LLM) training and inference. It targets developers and researchers needing to process web content efficiently, offering a streamlined way to prepare data for LLM applications.

How It Works

The system leverages FireCrawl for web crawling to extract content from specified URLs. It then utilizes GPT-4-mini for text processing, consolidating the extracted information into a single text file. Two output formats are generated: a standard llms.txt and a more comprehensive llms-full.txt.

Quick Start & Requirements

  • Web Interface: Visit llmstxt.firecrawl.dev for browser-based generation.
  • API Endpoint: GET https://llmstxt.firecrawl.dev/[YOUR_URL_HERE]
  • Local Development: Requires npm install and npm run dev.
  • Prerequisites: A .env file with FIRECRAWL_API_KEY, SUPABASE_URL, SUPABASE_KEY, and OPENAI_API_KEY is necessary for local setup.

Highlighted Details

  • Powered by FireCrawl for web crawling and GPT-4-mini for text processing.
  • Generates both llms.txt and llms-full.txt output formats.
  • Offers both a web interface and an API endpoint.
  • No API key required for basic usage via the web interface.

Maintenance & Community

The project is associated with @firecrawl_dev. Further community or maintenance details are not provided in the README.

Licensing & Compatibility

The licensing information is not specified in the README.

Limitations & Caveats

Processing times can be several minutes due to crawling and LLM operations. Local development requires specific API keys for FireCrawl, Supabase, and OpenAI.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
81 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
7 more.

firecrawl by mendableai

1.9%
44k
API service for turning websites into LLM-ready data
created 1 year ago
updated 1 day ago
Feedback? Help us improve.