llm-min.txt by marv1nnnnn

CLI tool for compressing tech docs into a structured format for LLMs

Created 8 months ago

668 stars

Top 50.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

This project addresses the challenge of Large Language Models (LLMs) having outdated knowledge due to "knowledge cutoffs" in rapidly evolving software libraries. It provides a solution by compressing technical documentation into a highly structured, machine-optimized format called llm-min.txt, enabling AI assistants to access up-to-date information efficiently. The target audience includes developers using AI coding assistants who need to overcome the limitations of LLM knowledge gaps.

How It Works

The project leverages Google's Gemini AI to distill technical documentation into a compact, machine-readable format called Structured Knowledge Format (SKF). This format organizes information into three core sections: Definitions (D), Interactions (I), and Usage Patterns (U), using precise line-based conventions. This approach dramatically reduces token count (90-97%) while preserving essential programmatic details, allowing LLMs to ingest and process library documentation far more efficiently than with raw text.

Quick Start & Requirements

Install: pip install llm-min
Prerequisites: playwright (for browser automation), Gemini API Key (set as GEMINI_API_KEY environment variable or via --gemini-api-key flag).
Usage: llm-min -pkg "typer" -o my_docs -p 50 or programmatically via LLMMinGenerator.
Documentation: GitHub Repository

Highlighted Details

Achieves 90-97% token reduction in documentation.
Utilizes a custom SKF format (SKF/1.4 LA) for structured AI parsing.
Supports generation from Python packages, URLs, and local directories.
Recommends gemini-2.5-flash-preview-04-17 for its reasoning and context window capabilities.

Maintenance & Community

The project is actively maintained by marv1nnnnn. Contributions are welcomed via GitHub pull requests.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The llm-min.txt format is explicitly lossy, omitting explanatory prose and peripheral information to achieve compression. Generation can be time-consuming due to the multi-stage AI pipeline. Users may encounter MAX_TOKENS errors with very dense documentation, potentially requiring adjustments to chunk-size.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days