toonify  by ScrapeGraphAI

A compact data format for efficient LLM communication

Created 2 months ago
291 stars

Top 90.7% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This project addresses the significant token usage and context window limitations inherent in Large Language Model (LLM) interactions. It introduces TOON (Token-Oriented Object Notation), a compact, human-readable serialization format designed to drastically reduce the number of tokens required for passing structured data to LLMs. This offers substantial cost savings and improved efficiency for developers and researchers working with LLM APIs.

How It Works

TOON achieves its compactness by adopting a CSV-like structure for uniform arrays and employing techniques like key folding for nested objects. It supports standard data types (strings, numbers, booleans, null) while preserving data structure and types. This approach results in significantly smaller data representations compared to JSON, with benchmarks showing an average reduction of 64% in size, directly translating to fewer tokens consumed.

Quick Start & Requirements

Installation is straightforward via pip: pip install toonify. Development dependencies can be installed with pip install toonify[dev], and Pydantic support requires pip install toonify[pydantic]. The library provides both a Python API for programmatic use and a command-line interface (CLI) for file conversions. No specific hardware or advanced software prerequisites are mentioned beyond a standard Python environment.

Highlighted Details

  • Performance: Benchmarked at 64% smaller than JSON on average, leading to 30-60% token reduction in LLM API calls, translating to significant cost savings.
  • Pydantic Integration: Seamlessly encodes and decodes Pydantic models (v1/v2), supporting nested structures, field aliases, and validation for robust data handling.
  • LLM Prompt Templates: Features generate_structure and generate_structure_from_pydantic to create unambiguous prompt templates, eliminating the need for example data and saving tokens.
  • Advanced Features: Includes optional key folding for collapsing nested keys into dotted paths and path expansion for decoding dotted paths into nested objects, offering flexibility and further optimization.

Maintenance & Community

The project is developed by the ScrapeGraph team. The primary community and development hub is the GitHub repository. No specific community channels like Discord or Slack, nor a public roadmap, are detailed in the README.

Licensing & Compatibility

The project is released under the permissive MIT License. This license generally allows for broad usage, including commercial applications and integration into closed-source projects, with minimal restrictions beyond attribution.

Limitations & Caveats

The provided README focuses on the benefits and features of the TOON format and the toonify library. It does not explicitly detail any current limitations, known bugs, alpha status, or unsupported platforms.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
22 stars in the last 30 days

Explore Similar Projects

Starred by Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
12 more.

datatrove by huggingface

0.6%
3k
Data processing library for large-scale text data
Created 2 years ago
Updated 4 days ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Hiroshi Shibata Hiroshi Shibata(Core Contributor to Ruby), and
9 more.

toon by toon-format

1.3%
22k
Compact data format for LLMs
Created 2 months ago
Updated 3 days ago
Feedback? Help us improve.