tiktoken-go  by pkoukk

Go library for OpenAI's tiktoken

created 2 years ago
802 stars

Top 44.9% on sourcepulse

GitHubView on GitHub
Project Summary

This Go library provides a port of OpenAI's tiktoken BPE tokenizer, enabling Go developers to efficiently tokenize text for OpenAI models. It offers similar caching mechanisms and allows for custom BPE loaders, including an offline option for environments without runtime downloads.

How It Works

The library implements the Byte Pair Encoding (BPE) algorithm, mirroring the functionality of the original Python tiktoken. It caches token dictionaries locally, configurable via the TIKTOKEN_CACHE_DIR environment variable, to avoid repeated downloads. Users can also provide custom BPE loaders by implementing the BpeLoader interface, facilitating offline usage or alternative data sourcing.

Quick Start & Requirements

  • Install: go get github.com/pkoukk/tiktoken-go
  • Prerequisites: Go toolchain.
  • Cache: Token dictionaries are cached locally; set TIKTOKEN_CACHE_DIR to specify a cache directory.

Highlighted Details

  • Supports multiple OpenAI encodings: o200k_base, cl100k_base, p50k_base, r50k_base.
  • Includes a utility function for counting tokens in chat API calls, based on OpenAI cookbook examples.
  • Benchmarks indicate performance is comparable to the original tiktoken, with minor variations depending on the encoding and environment.

Maintenance & Community

  • The project is maintained by pkoukk.
  • Contributions are welcomed via PRs or Issues for benchmark improvements or feature requests.

Licensing & Compatibility

  • License: MIT.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

The token counting function for chat API calls is noted as potentially subject to change by OpenAI, and the provided implementation assumes specific model versions. The o200k_base encoding appears to be slower than cl100k_base in benchmarks.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
1
Star History
44 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

super-rag by superagent-ai

0.5%
380
RAG pipeline for AI apps
created 1 year ago
updated 1 year ago
Feedback? Help us improve.