toonify by ScrapeGraphAI

A compact data format for efficient LLM communication

Created 3 months ago

312 stars

Top 86.6% on SourcePulse

Project Summary

Summary

This project addresses the significant token usage and context window limitations inherent in Large Language Model (LLM) interactions. It introduces TOON (Token-Oriented Object Notation), a compact, human-readable serialization format designed to drastically reduce the number of tokens required for passing structured data to LLMs. This offers substantial cost savings and improved efficiency for developers and researchers working with LLM APIs.

How It Works

TOON achieves its compactness by adopting a CSV-like structure for uniform arrays and employing techniques like key folding for nested objects. It supports standard data types (strings, numbers, booleans, null) while preserving data structure and types. This approach results in significantly smaller data representations compared to JSON, with benchmarks showing an average reduction of 64% in size, directly translating to fewer tokens consumed.

Quick Start & Requirements

Installation is straightforward via pip: pip install toonify. Development dependencies can be installed with pip install toonify[dev], and Pydantic support requires pip install toonify[pydantic]. The library provides both a Python API for programmatic use and a command-line interface (CLI) for file conversions. No specific hardware or advanced software prerequisites are mentioned beyond a standard Python environment.

Highlighted Details

Performance: Benchmarked at 64% smaller than JSON on average, leading to 30-60% token reduction in LLM API calls, translating to significant cost savings.
Pydantic Integration: Seamlessly encodes and decodes Pydantic models (v1/v2), supporting nested structures, field aliases, and validation for robust data handling.
LLM Prompt Templates: Features generate_structure and generate_structure_from_pydantic to create unambiguous prompt templates, eliminating the need for example data and saving tokens.
Advanced Features: Includes optional key folding for collapsing nested keys into dotted paths and path expansion for decoding dotted paths into nested objects, offering flexibility and further optimization.

Maintenance & Community

The project is developed by the ScrapeGraph team. The primary community and development hub is the GitHub repository. No specific community channels like Discord or Slack, nor a public roadmap, are detailed in the README.

Licensing & Compatibility

The project is released under the permissive MIT License. This license generally allows for broad usage, including commercial applications and integration into closed-source projects, with minimal restrictions beyond attribution.

Limitations & Caveats

The provided README focuses on the benefits and features of the TOON format and the toonify library. It does not explicitly detail any current limitations, known bugs, alpha status, or unsupported platforms.

toonify by ScrapeGraphAI

Explore Similar Projects

Shapeshift by rectanglehq

python-toon by xaviviro

easydoc by easydoc-ai

super-json-mode by varunshenoy

prompt-optimizer by vaibkumr

strictjson by tanchongmin

json.cpp by jart

seqio by google

open-parse by Filimoa

datatrove by huggingface

toon by toon-format

unstructured by Unstructured-IO