GPT-3-Encoder  by latitudegames

JS library for GPT-2/GPT-3 text tokenization

created 4 years ago
719 stars

Top 48.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a JavaScript implementation of the Byte Pair Encoding (BPE) encoder/decoder used by OpenAI's GPT-2 and GPT-3 models. It allows developers to tokenize and detokenize text directly within JavaScript environments, such as web browsers or Node.js applications, enabling client-side processing or integration with JavaScript-based AI workflows.

How It Works

The library implements the BPE algorithm, which breaks down text into subword units (tokens) based on frequency. This approach balances vocabulary size with the ability to represent rare words or novel character sequences, a key technique for efficient natural language processing with large language models. The JavaScript implementation mirrors OpenAI's original Python encoder/decoder.

Quick Start & Requirements

  • Install: npm install gpt-3-encoder
  • Requirements: Node.js >= 12.
  • Usage: Import encode and decode functions. See README for example.

Highlighted Details

  • JavaScript implementation of GPT-2/GPT-3 BPE encoding/decoding.
  • Enables client-side tokenization for web applications.
  • Compatible with Node.js environments.

Maintenance & Community

This project appears to be a direct port of OpenAI's encoder and has not shown significant recent activity or community engagement.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is a direct port and may not include optimizations or features found in more actively maintained libraries. Its utility is primarily for environments where a pure JavaScript solution is required.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
7 more.

minbpe by karpathy

0.2%
10k
Minimal BPE encoder/decoder for LLM tokenization
created 1 year ago
updated 1 year ago
Feedback? Help us improve.