Discover and explore top open-source AI tools and projects—updated daily.
pkoukkGo library for OpenAI's tiktoken
Top 42.9% on SourcePulse
This Go library provides a port of OpenAI's tiktoken BPE tokenizer, enabling Go developers to efficiently tokenize text for OpenAI models. It offers similar caching mechanisms and allows for custom BPE loaders, including an offline option for environments without runtime downloads.
How It Works
The library implements the Byte Pair Encoding (BPE) algorithm, mirroring the functionality of the original Python tiktoken. It caches token dictionaries locally, configurable via the TIKTOKEN_CACHE_DIR environment variable, to avoid repeated downloads. Users can also provide custom BPE loaders by implementing the BpeLoader interface, facilitating offline usage or alternative data sourcing.
Quick Start & Requirements
go get github.com/pkoukk/tiktoken-goTIKTOKEN_CACHE_DIR to specify a cache directory.Highlighted Details
o200k_base, cl100k_base, p50k_base, r50k_base.tiktoken, with minor variations depending on the encoding and environment.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The token counting function for chat API calls is noted as potentially subject to change by OpenAI, and the provided implementation assumes specific model versions. The o200k_base encoding appears to be slower than cl100k_base in benchmarks.
1 month ago
Inactive
njerschow