Tiny Reasoning Model

2025-12-22

Tiny Reasoning Model

The faster twin of a HRM

Tags: AI, Machine Learning, Reasoning Models

Large Language Models can be:

  • wide
  • deep
  • massive

But at inference time, they still execute a mostly fixed computation graph once per token.

That single fact is fatal for tasks that require:

  • multi-step search
  • hypothesis revision
  • backtracking
  • long-horizon planning

These are algorithmic problems, not perception problems.

No amount of parameter scaling fixes this.

The Core Failure Mode of LLMs

Across domains like Sudoku, mazes, and ARC, the failure pattern is consistent:

  • one wrong step poisons the entire output
  • chain-of-thought helps only superficially
  • increasing sampling or tokens explodes cost without improving reliability

LLMs do not reason.
They commit.

Once a bad assumption is made, there is no internal mechanism to revise it.

HRM Was Directionally Right, Structurally Overcomplicated

Hierarchical Reasoning Models showed something important:

Iterative latent computation matters more than scale.

But HRM framed the solution incorrectly.

It leaned on:

  • hierarchy
  • biological metaphors
  • fixed-point convergence
  • multiple interacting modules

TRM asks a simpler, sharper question:

What is the “high-level” state actually doing?

Answer:
It is just the current proposed solution.

And the “low-level” state?
Just latent reasoning state.

Once you accept this, the hierarchy collapses.

The Key Insight

Reasoning Is Iteration, Not Hierarchy

You do not need:

  • planners vs workers
  • fast vs slow modules
  • nested abstractions

You need only two things:

  • y: the current candidate solution
  • z: the internal reasoning state

And a way to repeatedly update both.

That is recursion.

What Is a Tiny Recursive Model (TRM)?

A TRM consists of:

  • a single tiny neural network (2 layers)
  • reused at every step
  • no role separation
  • no special modules

The behavior emerges from how the network is called, not from architectural complexity.

How Computation Actually Happens

Reasoning proceeds through explicit recursion.

Each step performs two updates:

1. Update reasoning state

z := net(x, y, z)

2. Refine the solution

y := net(y, z)

This loop is repeated many times.

Each iteration:

  • inspects the current solution
  • identifies errors implicitly
  • proposes a refinement

This is learned self-correction, end to end.

No decoding. No token sampling. No narration.

Depth Comes From Time, Not Layers

TRMs are deliberately small:

  • 2 layers
  • ~5 to 7M parameters

Why?

Because depth is created by recursion, not architecture.

A TRM with:

  • 2 layers
  • 6 inner recursions
  • 3 outer cycles

achieves over 40 sequential transformations.

Unlike Transformers:

  • earlier assumptions can be revised
  • computation does not collapse into one pass
  • errors are not permanent

This is the difference between:

  • pattern recognition
  • algorithm execution

Deep Supervision

Learning to Improve, Not Just Predict

TRM does not wait until the end to apply loss.

After each reasoning segment:

  • an answer is produced
  • loss is computed
  • state is detached
  • reasoning continues

The model is trained to answer a different question:

“Given my current partial solution, how do I make it better?”

This is fundamentally different from:

“Given the input, predict the output.”

That difference is why TRMs generalize under tiny data regimes where large models overfit.

No Fixed Points, No Gradient Tricks

HRM depends on:

  • fixed-point assumptions
  • implicit gradients
  • one-step approximations

TRM removes all of it.

There is:

  • no convergence assumption
  • no equilibrium requirement
  • no gradient approximation

TRM backpropagates through the full recursion.

Yes, it costs more memory. Yes, it works better.

Ablations confirm this. One-step gradients collapse performance.

Adaptive Computation Without Complexity

TRM keeps adaptive computation time, but simplifies it:

  • a single halting head
  • binary supervision: “is the answer correct?”
  • no reinforcement learning
  • no second forward pass

Easy problems halt early. Hard problems get more compute.

This gives:

  • inference-time scaling
  • training efficiency
  • architectural simplicity

What the Model Actually Learns

Visualizations show a critical distinction:

  • y always decodes to a valid candidate solution
  • z never does

z is not symbolic. It is not interpretable.

It is pure reasoning state.

Across tasks, the same architecture learns:

  • constraint propagation for Sudoku
  • wavefront exploration for mazes
  • incremental rule induction for ARC

No hand-coded solvers. No task-specific logic.

Just recursion.

Why Smaller Models Generalize Better

One of the most uncomfortable results:

Making the network larger hurts performance.

Observed trends:

  • 4 layers worse than 2
  • MoE worse than dense
  • more parameters equals faster overfitting

TRM works precisely because:

  • reasoning lives in time
  • not in the weights

This is the inverse of the LLM paradigm.

Why Hierarchy Was a Red Herring

HRM worked. But not because of hierarchy.

It worked because:

  • state persisted across steps
  • computation was iterative
  • answers were refined, not predicted

TRM removes:

  • multiple networks
  • biological framing
  • fixed-point math

And performs better.

Hierarchy was an explanation. Recursion is the mechanism.

Results That Actually Matter

With ~7M parameters and ~1000 training examples, TRM achieves:

  • 87% on Sudoku-Extreme
  • 85% on Maze-Hard
  • 45% on ARC-AGI-1
  • 8% on ARC-AGI-2

This beats:

  • HRM with 27M parameters
  • frontier LLMs with billions to trillions of parameters

No pretraining. No chain-of-thought. No token sampling.

The Real Takeaway

TRM demonstrates something uncomfortable:

Reasoning is not a scaling problem. It is a compute-structure problem.

If a model lacks:

  • persistent state
  • iterative refinement
  • self-revision

no amount of parameters will make it reason.

Transformers are elite token predictors. TRMs are learned recursive solvers.

What Comes Next

TRMs are not the endgame.

They are:

  • supervised
  • deterministic
  • non-generative

But they point clearly toward the future:

  • internal reasoning over external narration
  • recursion over architectural depth
  • state over tokens

Scaling token predictors gave us fluency.

Recursive architectures are how you get thinking.