Markdown generator for LLM ingestion
Top 92.8% on sourcepulse
This tool parses files and websites into a single markdown file or directly to an LLM, targeting developers and researchers preparing data for AI models. It streamlines data ingestion by offering features like code compression, VRAM estimation, and LLM integration, reducing manual effort and improving compatibility with AI models.
How It Works
Ingest traverses directory structures, optionally compressing code using Tree-sitter to retain structural information while omitting implementation details. It tokenizes content and can integrate directly with LLMs via OpenAI-compatible APIs (like Ollama) or save output to files. A key feature is its VRAM estimation and model compatibility checking, leveraging a separate package to help users determine if their data fits within specified model constraints.
Quick Start & Requirements
go install github.com/sammcj/ingest@HEAD
(recommended) or via a provided curl
script.cl100k_base.tiktoken
tokenizer on first run.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The Tree-sitter compression is experimental and currently supports a limited set of languages. The README notes that version printing (-V
) is a work-in-progress.
2 months ago
Inactive