Go NLP tokenizers for Hugging Face models
Top 99.2% on sourcepulse
This Go package provides pure Go implementations of NLP tokenizers, inspired by HuggingFace's Tokenizers library. It enables Gophers to integrate advanced NLP models for training, testing, and inference directly within Go applications, facilitating faster production software development.
How It Works
The tokenizer is modular, featuring distinct sub-packages for Normalizer, Pretokenizer, Tokenizer, and Post-processing. It supports key tokenization models including Word Level, Wordpiece, and Byte Pair Encoding (BPE). This design allows for both training new models from scratch and fine-tuning existing ones, offering flexibility for diverse NLP tasks.
Quick Start & Requirements
go get github.com/sugarme/tokenizer
bert-base-uncased
) via the pretrained
subpackage.Highlighted Details
Maintenance & Community
No specific community channels, roadmap, or contributor information is detailed in the README.
Licensing & Compatibility
Limitations & Caveats
The README does not detail performance benchmarks, specific model compatibility beyond HuggingFace, or provide information on community support or project roadmap.
2 weeks ago
Inactive