Discover and explore top open-source AI tools and projects—updated daily.
mlc-aiCross-platform C++ tokenizer binding library for universal deployment
Top 71.1% on SourcePulse
This C++ library provides a unified, cross-platform interface for HuggingFace tokenizers and SentencePiece, targeting developers building native language model applications. It simplifies tokenizer deployment across diverse platforms like iOS, Android, Windows, Linux, and web browsers by offering a minimal C++ API with reduced dependencies.
How It Works
The project wraps existing Rust implementations of HuggingFace tokenizers and SentencePiece, exposing them through a common C++ interface. It leverages Rust for its performance and cross-compilation capabilities, particularly for mobile and web targets via Emscripten. This approach aims to abstract away the complexities of individual tokenizer libraries and their build processes, enabling seamless integration into C++ projects.
Quick Start & Requirements
add_subdirectory in CMake.rustup target add aarch64-apple-ios) may be needed.example folder for a CMake project example.Highlighted Details
libtokenizers_c.a (Rust binding), libsentencepice.a (SentencePiece), and libtokenizers_cpp.a (C++ binding).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not explicitly state the license of the project itself, only that it builds upon other libraries. It focuses on static library generation, and dynamic linking options are not mentioned.
2 months ago
1+ week
guillaume-be
QwenLM
huggingface
huggingface
huggingface
SillyTavern
kaldi-asr