CLI tool for counting and truncating text based on tokens
Top 78.2% on sourcepulse
ttok
is a command-line utility for counting and truncating text based on token counts, primarily for use with Large Language Models (LLMs). It leverages OpenAI's tiktoken
library, making it useful for developers and researchers working with LLM APIs that have token-based pricing or context window limits.
How It Works
The tool utilizes the tiktoken
library to encode text into integer token IDs, mirroring how LLMs process input. It supports various OpenAI models by allowing users to specify the model via the -m
flag, ensuring accurate tokenization for different LLM architectures. The core functionality includes counting tokens in provided text or piped input and truncating text to a specified token limit using the -t
flag.
Quick Start & Requirements
pip install ttok
brew install simonw/llm/ttok
Highlighted Details
--encode
) and decode them back to text (--decode
).--tokens
).-i -
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The tool relies on the tiktoken
library, meaning its accuracy is tied to the library's updates and support for specific models. No specific limitations are mentioned in the README regarding unsupported platforms or known bugs.
1 year ago
1+ week