TinyRecursiveModels  by SamsungSAILMontreal

Tiny recursive models excel at complex reasoning

Created 1 week ago

New!

4,430 stars

Top 11.0% on SourcePulse

GitHubView on GitHub
Project Summary

Recursive reasoning with tiny networks is addressed by the Tiny Recursive Model (TRM), a project aiming to challenge the necessity of massive foundational models for complex tasks. It targets researchers and engineers seeking efficient, parameter-light approaches to AI reasoning. TRM offers a cost-effective alternative, achieving notable performance on challenging benchmarks using a significantly smaller model footprint.

How It Works

TRM employs a recursive self-improvement mechanism. It iteratively refines its answer by updating a latent state (z) and the predicted answer (y) over a series of steps (K). This process involves recursively updating z based on the input question (x), current y, and z, followed by updating y using the current y and z. This approach allows for progressive error correction and answer refinement in a highly parameter-efficient manner, minimizing overfitting and simplifying prior recursive reasoning frameworks.

Quick Start & Requirements

  • Primary install: pip install -r requirements.txt after setting up PyTorch with CUDA.
  • Prerequisites: Python 3.10+, CUDA 12.6.0+, adam-atan2, and wandb for optional logging.
  • Dataset preparation involves executing specific Python scripts for ARC-AGI, Sudoku, and Maze datasets.
  • Example training commands are provided, often assuming multi-GPU setups (e.g., 4x H-100 GPUs for ARC-AGI).
  • Training times can be substantial, estimated at ~3 days for ARC-AGI on 4x H-100s.

Highlighted Details

  • Achieves 45% accuracy on ARC-AGI-1 and 8% on ARC-AGI-2.
  • Utilizes a tiny neural network with only 7 million parameters.
  • Demonstrates the "less is more" principle in recursive reasoning.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were found in the provided text. The code is noted as being based on the Hierarchical Reasoning Model (HRM) codebase.

Licensing & Compatibility

The project's license is not specified in the README. This omission is a critical factor for assessing compatibility with commercial or closed-source applications.

Limitations & Caveats

The project's focus on minimal parameters for recursive reasoning may inherently limit its applicability to tasks requiring broad world knowledge or complex, non-recursive problem-solving. Specific CUDA version requirements (12.6.0+) could pose compatibility challenges. The absence of a stated license is a significant adoption blocker.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
14
Star History
4,567 stars in the last 9 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and
7 more.

reasoning-gym by open-thought

1.2%
1k
Procedural dataset generator for reasoning models
Created 8 months ago
Updated 1 week ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
3 more.

HRM by sapientinc

2.7%
11k
Hierarchical reasoning for complex tasks
Created 3 months ago
Updated 1 month ago
Feedback? Help us improve.