tiktoken-go  by pkoukk

Go library for OpenAI's tiktoken

Created 2 years ago
816 stars

Top 43.4% on SourcePulse

GitHubView on GitHub
Project Summary

This Go library provides a port of OpenAI's tiktoken BPE tokenizer, enabling Go developers to efficiently tokenize text for OpenAI models. It offers similar caching mechanisms and allows for custom BPE loaders, including an offline option for environments without runtime downloads.

How It Works

The library implements the Byte Pair Encoding (BPE) algorithm, mirroring the functionality of the original Python tiktoken. It caches token dictionaries locally, configurable via the TIKTOKEN_CACHE_DIR environment variable, to avoid repeated downloads. Users can also provide custom BPE loaders by implementing the BpeLoader interface, facilitating offline usage or alternative data sourcing.

Quick Start & Requirements

  • Install: go get github.com/pkoukk/tiktoken-go
  • Prerequisites: Go toolchain.
  • Cache: Token dictionaries are cached locally; set TIKTOKEN_CACHE_DIR to specify a cache directory.

Highlighted Details

  • Supports multiple OpenAI encodings: o200k_base, cl100k_base, p50k_base, r50k_base.
  • Includes a utility function for counting tokens in chat API calls, based on OpenAI cookbook examples.
  • Benchmarks indicate performance is comparable to the original tiktoken, with minor variations depending on the encoding and environment.

Maintenance & Community

  • The project is maintained by pkoukk.
  • Contributions are welcomed via PRs or Issues for benchmark improvements or feature requests.

Licensing & Compatibility

  • License: MIT.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

The token counting function for chat API calls is noted as potentially subject to change by OpenAI, and the provided implementation assumes specific model versions. The o200k_base encoding appears to be slower than cl100k_base in benchmarks.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.