TinyRecursiveModels by SamsungSAILMontreal

Tiny recursive models excel at complex reasoning

Created 5 months ago

6,391 stars

Top 8.0% on SourcePulse

View on GitHub

4 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Victor Taelin

Author of Bend, Kind, HVM

Travis Fischer

Founder of Agentic

Aurélien Geron

Author of "Hands-on Machine Learning with Scikit-Learn, Keras & Tensorflow"

Project Summary

Recursive reasoning with tiny networks is addressed by the Tiny Recursive Model (TRM), a project aiming to challenge the necessity of massive foundational models for complex tasks. It targets researchers and engineers seeking efficient, parameter-light approaches to AI reasoning. TRM offers a cost-effective alternative, achieving notable performance on challenging benchmarks using a significantly smaller model footprint.

How It Works

TRM employs a recursive self-improvement mechanism. It iteratively refines its answer by updating a latent state (z) and the predicted answer (y) over a series of steps (K). This process involves recursively updating z based on the input question (x), current y, and z, followed by updating y using the current y and z. This approach allows for progressive error correction and answer refinement in a highly parameter-efficient manner, minimizing overfitting and simplifying prior recursive reasoning frameworks.

Quick Start & Requirements

Primary install: pip install -r requirements.txt after setting up PyTorch with CUDA.
Prerequisites: Python 3.10+, CUDA 12.6.0+, adam-atan2, and wandb for optional logging.
Dataset preparation involves executing specific Python scripts for ARC-AGI, Sudoku, and Maze datasets.
Example training commands are provided, often assuming multi-GPU setups (e.g., 4x H-100 GPUs for ARC-AGI).
Training times can be substantial, estimated at ~3 days for ARC-AGI on 4x H-100s.

Highlighted Details

Achieves 45% accuracy on ARC-AGI-1 and 8% on ARC-AGI-2.
Utilizes a tiny neural network with only 7 million parameters.
Demonstrates the "less is more" principle in recursive reasoning.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were found in the provided text. The code is noted as being based on the Hierarchical Reasoning Model (HRM) codebase.

Licensing & Compatibility

The project's license is not specified in the README. This omission is a critical factor for assessing compatibility with commercial or closed-source applications.

Limitations & Caveats

The project's focus on minimal parameters for recursive reasoning may inherently limit its applicability to tasks requiring broad world knowledge or complex, non-recursive problem-solving. Specific CUDA version requirements (12.6.0+) could pose compatibility challenges. The absence of a stated license is a significant adoption blocker.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

98 stars in the last 30 days