JS tokenizer for LLaMA 1 and 2 models, client-side in browser
Top 79.7% on sourcepulse
This library provides a JavaScript tokenizer specifically for LLaMA 1 and LLaMA 2 models, enabling accurate client-side token counting in web applications. It offers a lightweight, zero-dependency solution for developers needing to manage token limits or analyze text length within the browser or Node.js environments.
How It Works
The tokenizer implements a Byte-Pair Encoding (BPE) algorithm, optimized for both runtime performance and minimal bundle size. It bakes the necessary vocabulary and merge data into a single, base64-encoded JavaScript file, eliminating external dependencies and simplifying integration. This approach ensures efficient processing directly within the client's environment.
Quick Start & Requirements
npm install llama-tokenizer-js
import llamaTokenizer from 'llama-tokenizer-js';
console.log(llamaTokenizer.encode("Hello world!").length);
<script>
tag is also supported.Highlighted Details
Maintenance & Community
Developed by belladore.ai with contributions from xenova, blaze2004, imoneoi, and ConProgramming. An example demo/playground is available.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility is confirmed for LLaMA models based on Facebook's March 2023 and July 2023 checkpoints. OpenLLaMA models are noted as incompatible.
Limitations & Caveats
The library is designed for LLaMA 1 and 2; LLaMA 3 requires a separate repository. Training is not supported. Custom tokenizers require manual vocabulary and merge data swapping.
1 year ago
1 day