llm-min.txt  by marv1nnnnn

CLI tool for compressing tech docs into a structured format for LLMs

created 3 months ago
646 stars

Top 52.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project addresses the challenge of Large Language Models (LLMs) having outdated knowledge due to "knowledge cutoffs" in rapidly evolving software libraries. It provides a solution by compressing technical documentation into a highly structured, machine-optimized format called llm-min.txt, enabling AI assistants to access up-to-date information efficiently. The target audience includes developers using AI coding assistants who need to overcome the limitations of LLM knowledge gaps.

How It Works

The project leverages Google's Gemini AI to distill technical documentation into a compact, machine-readable format called Structured Knowledge Format (SKF). This format organizes information into three core sections: Definitions (D), Interactions (I), and Usage Patterns (U), using precise line-based conventions. This approach dramatically reduces token count (90-97%) while preserving essential programmatic details, allowing LLMs to ingest and process library documentation far more efficiently than with raw text.

Quick Start & Requirements

  • Install: pip install llm-min
  • Prerequisites: playwright (for browser automation), Gemini API Key (set as GEMINI_API_KEY environment variable or via --gemini-api-key flag).
  • Usage: llm-min -pkg "typer" -o my_docs -p 50 or programmatically via LLMMinGenerator.
  • Documentation: GitHub Repository

Highlighted Details

  • Achieves 90-97% token reduction in documentation.
  • Utilizes a custom SKF format (SKF/1.4 LA) for structured AI parsing.
  • Supports generation from Python packages, URLs, and local directories.
  • Recommends gemini-2.5-flash-preview-04-17 for its reasoning and context window capabilities.

Maintenance & Community

The project is actively maintained by marv1nnnnn. Contributions are welcomed via GitHub pull requests.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The llm-min.txt format is explicitly lossy, omitting explanatory prose and peripheral information to achieve compression. Generation can be time-consuming due to the multi-stage AI pipeline. Users may encounter MAX_TOKENS errors with very dense documentation, potentially requiring adjustments to chunk-size.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
1
Star History
648 stars in the last 90 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Travis Fischer Travis Fischer(Founder of Agentic), and
1 more.

instructor-js by 567-labs

0%
738
Typescript tool for structured extraction from LLMs
created 1 year ago
updated 6 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
21 more.

guidance by guidance-ai

0.1%
21k
Guidance is a programming paradigm for steering LLMs
created 2 years ago
updated 1 day ago
Feedback? Help us improve.