Go library for OpenAI's tiktoken
Top 44.9% on sourcepulse
This Go library provides a port of OpenAI's tiktoken
BPE tokenizer, enabling Go developers to efficiently tokenize text for OpenAI models. It offers similar caching mechanisms and allows for custom BPE loaders, including an offline option for environments without runtime downloads.
How It Works
The library implements the Byte Pair Encoding (BPE) algorithm, mirroring the functionality of the original Python tiktoken
. It caches token dictionaries locally, configurable via the TIKTOKEN_CACHE_DIR
environment variable, to avoid repeated downloads. Users can also provide custom BPE loaders by implementing the BpeLoader
interface, facilitating offline usage or alternative data sourcing.
Quick Start & Requirements
go get github.com/pkoukk/tiktoken-go
TIKTOKEN_CACHE_DIR
to specify a cache directory.Highlighted Details
o200k_base
, cl100k_base
, p50k_base
, r50k_base
.tiktoken
, with minor variations depending on the encoding and environment.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The token counting function for chat API calls is noted as potentially subject to change by OpenAI, and the provided implementation assumes specific model versions. The o200k_base
encoding appears to be slower than cl100k_base
in benchmarks.
1 year ago
Inactive