ingest  by sammcj

Markdown generator for LLM ingestion

created 1 year ago
285 stars

Top 92.8% on sourcepulse

GitHubView on GitHub
Project Summary

This tool parses files and websites into a single markdown file or directly to an LLM, targeting developers and researchers preparing data for AI models. It streamlines data ingestion by offering features like code compression, VRAM estimation, and LLM integration, reducing manual effort and improving compatibility with AI models.

How It Works

Ingest traverses directory structures, optionally compressing code using Tree-sitter to retain structural information while omitting implementation details. It tokenizes content and can integrate directly with LLMs via OpenAI-compatible APIs (like Ollama) or save output to files. A key feature is its VRAM estimation and model compatibility checking, leveraging a separate package to help users determine if their data fits within specified model constraints.

Quick Start & Requirements

  • Install: go install github.com/sammcj/ingest@HEAD (recommended) or via a provided curl script.
  • Prerequisites: Go installation. Downloads a cl100k_base.tiktoken tokenizer on first run.
  • Docs: https://github.com/sammcj/ingest

Highlighted Details

  • Code compression using Tree-sitter for Go, Python, JavaScript, Bash, C, and CSS.
  • VRAM estimation and model compatibility checks for GGUF and ExLlamaV2 models.
  • Direct LLM integration with Ollama and OpenAI-compatible APIs.
  • Web crawling capabilities with domain restrictions and depth control.
  • Git diff and log inclusion for version-controlled projects.

Maintenance & Community

  • Project maintained by Sam McLeod.
  • Contributions are welcome via Pull Requests.

Licensing & Compatibility

  • MIT License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

The Tree-sitter compression is experimental and currently supports a limited set of languages. The README notes that version printing (-V) is a work-in-progress.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
75 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.