blt by facebookresearch

Code for Byte Latent Transformer research paper

Created 1 year ago

2,023 stars

Top 21.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Jason Knight

Director AI Compilers at NVIDIA; Cofounder of OctoML

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Project Summary

Byte Latent Transformer (BLT) introduces a novel byte-level Large Language Model (LLM) architecture that achieves performance comparable to tokenization-based LLMs at scale, offering significant inference efficiency and robustness benefits. It is designed for researchers and practitioners interested in exploring efficient and robust LLM architectures that operate directly on raw bytes.

How It Works

BLT encodes bytes into dynamically sized patches, using entropy to segment these patches and allocate computational resources based on data complexity. This approach, detailed in the paper "Byte Latent Transformer: Patches Scale Better Than Tokens," features new attention mechanisms for enhanced byte and patch representation flow and a unique byte-sequence memory. The dynamic patching allows for potentially longer average patch lengths during training and inference, contributing to efficiency gains.

Quick Start & Requirements

Environment Setup: Clone the repository and run bash setup/create_env.sh or sbatch setup/create_env.sh for SLURM clusters. Activate the environment with conda activate blt_<date>.
Model Weights: Request access to BLT 1B or 7B model weights on Hugging Face, log in via huggingface-cli login, download using python download_blt_weights.py, and run the demo with python demo.py "A BLT has".
Data Preparation: Use python setup/download_prepare_hf_data.py for datasets like fineweb_edu. Tokenizer download requires python setup/download_tokenizer.py.
Training/Inference: Launch jobs using stool (SLURM) or torchrun for local execution. Configuration files are provided for debugging and training.
Hardware: H100 GPUs are tested; other hardware suggestions are limited.

Highlighted Details

Achieves tokenization-based LLM performance at scale.
Offers significant inference efficiency and robustness improvements.
First scaling study of byte-level models up to 8B parameters and 8T training bytes.
Dynamic patching based on byte entropy allocates compute where complexity is higher.

Maintenance & Community

The project is actively being updated for reproducibility. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The BLT code is licensed under CC-BY-NC-4.0. This license restricts commercial use and linking with closed-source applications. The project also notes partial reliance on Meta Lingua, suggesting citation for both.

Limitations & Caveats

The code is still under active development for reproducibility, and some data preparation instructions are not well-tested. The CC-BY-NC-4.0 license imposes significant restrictions on commercial adoption.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days