blt  by facebookresearch

Code for Byte Latent Transformer research paper

created 7 months ago
1,760 stars

Top 24.9% on sourcepulse

GitHubView on GitHub
Project Summary

Byte Latent Transformer (BLT) introduces a novel byte-level Large Language Model (LLM) architecture that achieves performance comparable to tokenization-based LLMs at scale, offering significant inference efficiency and robustness benefits. It is designed for researchers and practitioners interested in exploring efficient and robust LLM architectures that operate directly on raw bytes.

How It Works

BLT encodes bytes into dynamically sized patches, using entropy to segment these patches and allocate computational resources based on data complexity. This approach, detailed in the paper "Byte Latent Transformer: Patches Scale Better Than Tokens," features new attention mechanisms for enhanced byte and patch representation flow and a unique byte-sequence memory. The dynamic patching allows for potentially longer average patch lengths during training and inference, contributing to efficiency gains.

Quick Start & Requirements

  • Environment Setup: Clone the repository and run bash setup/create_env.sh or sbatch setup/create_env.sh for SLURM clusters. Activate the environment with conda activate blt_<date>.
  • Model Weights: Request access to BLT 1B or 7B model weights on Hugging Face, log in via huggingface-cli login, download using python download_blt_weights.py, and run the demo with python demo.py "A BLT has".
  • Data Preparation: Use python setup/download_prepare_hf_data.py for datasets like fineweb_edu. Tokenizer download requires python setup/download_tokenizer.py.
  • Training/Inference: Launch jobs using stool (SLURM) or torchrun for local execution. Configuration files are provided for debugging and training.
  • Hardware: H100 GPUs are tested; other hardware suggestions are limited.

Highlighted Details

  • Achieves tokenization-based LLM performance at scale.
  • Offers significant inference efficiency and robustness improvements.
  • First scaling study of byte-level models up to 8B parameters and 8T training bytes.
  • Dynamic patching based on byte entropy allocates compute where complexity is higher.

Maintenance & Community

The project is actively being updated for reproducibility. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The BLT code is licensed under CC-BY-NC-4.0. This license restricts commercial use and linking with closed-source applications. The project also notes partial reliance on Meta Lingua, suggesting citation for both.

Limitations & Caveats

The code is still under active development for reproducibility, and some data preparation instructions are not well-tested. The CC-BY-NC-4.0 license imposes significant restrictions on commercial adoption.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
221 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

HALOs by ContextualAI

0.2%
873
Library for aligning LLMs using human-aware loss functions
created 1 year ago
updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeremy Howard Jeremy Howard(Cofounder of fast.ai).

GPTFast by MDK8888

0%
685
HF Transformers accelerator for faster inference
created 1 year ago
updated 11 months ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Feedback? Help us improve.