Research paper on a fully attention-based neural network with tokenized model parameters
Top 57.6% on sourcepulse
TokenFormer introduces a novel, fully attention-based neural network architecture that tokenizes model parameters, enabling flexible and scalable Transformer designs. It targets researchers and practitioners seeking to enhance Transformer efficiency and adaptability, offering a unified approach to token-token and token-parameter interactions.
How It Works
TokenFormer reimagines the Transformer by treating model parameters as attendable tokens alongside input data tokens. This allows the attention mechanism to mediate interactions between data and parameters, facilitating dynamic, data-dependent parameter updates. This approach aims to maximize architectural flexibility, allowing for the construction of diverse network types, including RNN-like structures (e.g., Mamba) or TTT networks, by manipulating token types and their interactions.
Quick Start & Requirements
pip install -r requirements/requirements.txt
. Additional requirements for flash attention, wandb, tensorboard, comet, and apex are available.Highlighted Details
Maintenance & Community
The project is led by Haiyang Wang and Bernt Schiele. News and updates are shared via GitHub releases.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The codebase was developed and tested on Python 3.8; compatibility with newer Python versions may be limited due to dependencies. The author notes that training code is released after limited testing and cannot guarantee the absence of issues. Visual modeling benchmarks are to be released later.
5 months ago
1 week