tokenizer  by tiktoken-go

Go port of OpenAI's tiktoken tokenizer

Created 2 years ago
387 stars

Top 74.0% on SourcePulse

GitHubView on GitHub
Project Summary

This Go library provides a pure Go implementation of OpenAI's tiktoken tokenizer, enabling efficient text encoding and decoding for large language models within Go applications. It targets developers needing to integrate LLM tokenization capabilities directly into their Go services without external dependencies or Python runtimes.

How It Works

The library directly embeds OpenAI's vocabulary data within Go maps, compiled during the build process. This approach avoids runtime downloads and caching, leading to potentially better performance and faster startup times compared to Python implementations that rely on external file loading. It supports multiple encoding types used by OpenAI models.

Quick Start & Requirements

  • Install: go get github.com/tiktoken-go/tokenizer
  • Requirements: Go toolchain.
  • Usage: Import github.com/tiktoken-go/tokenizer and use tokenizer.Get() with desired encoding (e.g., tokenizer.Cl100kBase). A CLI tool is also included for direct use.

Highlighted Details

  • Pure Go implementation, no Python dependency.
  • Embeds vocabularies for faster startup and runtime.
  • Supports cl100k_base, o200k_base, r50k_base, p50k_base, p50k_edit encodings.
  • Includes a command-line interface for direct tokenization.

Maintenance & Community

The project appears to be actively maintained, with a clear list of completed and pending tasks in the README. No specific community channels or external contributors are highlighted.

Licensing & Compatibility

The README does not explicitly state a license. Given it's a port of OpenAI's tokenizer, users should verify licensing implications, especially for commercial use.

Limitations & Caveats

The library embeds ~4MB of vocabulary data directly into the Go binary. Handling of special tokens and the gpt-2 model encoding are listed as pending.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.