TokenFormer by Haiyang-W

Research paper on a fully attention-based neural network with tokenized model parameters

Created 1 year ago

581 stars

Top 55.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

TokenFormer introduces a novel, fully attention-based neural network architecture that tokenizes model parameters, enabling flexible and scalable Transformer designs. It targets researchers and practitioners seeking to enhance Transformer efficiency and adaptability, offering a unified approach to token-token and token-parameter interactions.

How It Works

TokenFormer reimagines the Transformer by treating model parameters as attendable tokens alongside input data tokens. This allows the attention mechanism to mediate interactions between data and parameters, facilitating dynamic, data-dependent parameter updates. This approach aims to maximize architectural flexibility, allowing for the construction of diverse network types, including RNN-like structures (e.g., Mamba) or TTT networks, by manipulating token types and their interactions.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment with Python 3.8, install PyTorch 2.2.1 with CUDA 12.1 support, and then install dependencies via pip install -r requirements/requirements.txt. Additional requirements for flash attention, wandb, tensorboard, comet, and apex are available.
Prerequisites: Python 3.8, CUDA 12.x, PyTorch 1.8+, Rust (for potential cargo issues), mpi4py (version 3.0.3 recommended).
Resources: Single-GPU evaluation is tested. Training examples are provided for single-node (8-GPU) and multi-node (Slurm) setups.
Links: Project Page, HuggingFace Weights, arXiv.

Highlighted Details

ICLR2025 Spotlight presentation.
Native scalability through tokenized parameters.
Supports incremental model scaling, reducing training costs.
Pretrained models available for language modeling (150M to 1.5B parameters) on the Pile dataset.
Codebase is clean, concise, and relies on minimal dependencies.

Maintenance & Community

The project is led by Haiyang Wang and Bernt Schiele. News and updates are shared via GitHub releases.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The codebase was developed and tested on Python 3.8; compatibility with newer Python versions may be limited due to dependencies. The author notes that training code is released after limited testing and cannot guarantee the absence of issues. Visual modeling benchmarks are to be released later.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days