AdderBoard by anadim

Minimal transformer for multi-digit addition

Created 4 months ago

358 stars

Top 77.9% on SourcePulse

Project Summary

This repository presents the AdderBoard challenge: to engineer the smallest possible autoregressive transformer capable of adding two 10-digit numbers with over 99% accuracy. It targets researchers and engineers interested in extreme model compression, efficient transformer architectures, and exploring the fundamental capabilities of transformers beyond natural language processing. The primary benefit is advancing the state-of-the-art in minimal, performant neural network designs for specific computational tasks.

How It Works

The project frames integer addition as a sequence-to-sequence problem for transformers. Models must autonomously learn digit alignment, per-digit arithmetic, and carry propagation through self-attention, MLPs, and autoregressive generation, without hardcoded logic in their forward pass. This necessitates innovative approaches to tokenization, data formatting, and architectural design to minimize parameter counts. Submissions are categorized into "Trained" (weights learned via generic algorithms) and "Hand-coded" (weights analytically determined for constructive proof).

Quick Start & Requirements

Primary Install/Run: No direct installation command provided. Submissions are typically verified using a provided verify.py script.
Prerequisites: Python.
Dependencies: Standard machine learning libraries are implied.
Links: verify.py script for testing submissions.

Highlighted Details

Achieves >= 99% accuracy on a held-out 10K test set of 10-digit number additions.
Leaderboards showcase "Hand-coded" models reaching as few as 12 parameters and "Trained" models around 67 parameters.
Key techniques include rank-1/low-rank projections, factorized embeddings, custom positional encodings (e.g., RoPE, ALiBi), weight tying, and curriculum learning.
Notable findings include a "parameter cliff" around 800 parameters for trained models and convergence on specific layer dimensions (d=4, d=7).

Maintenance & Community

Maintained by Dimitris Papailiopoulos (@dimitrispapail).
No explicit community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

License: MIT.
Compatibility: The MIT license permits commercial use and integration into closed-source projects.

Limitations & Caveats

The challenge is strictly defined for 10-digit integer addition. Models must adhere to a pure autoregressive transformer definition, forbidding task-specific control flow within the model's forward pass. Some leaderboard entries are marked as preliminary, requiring independent verification on the 10K test suite.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days