Less is More: Recursive Reasoning
with Tiny Networks
The Problem
Why Giant LLMs Fail at Logic?
LLMs ike GPT-5 and Claude Sonnet 4.5 are masters of
pattern matching and fluent text generation. However,
they hit a wall with structured reasoning tasks like
Sudoku, complex mazes, or ARC-AGI puzzles.
The core issue is their autoregressive nature—they
predict the next token without a built-in mechanism to
backtrack or correct a flawed logical step.
A single early mistake in a Sudoku grid cascades,
invalidating the entire solution. Scaling up data and
parameters doesn't fix this fundamental architectural
limitation.
Source: Medium
Hierarchical Reasoning Models
Before TRMs, Hierarchical Reasoning Models (HRMs)
offered a promising path. HRMs used two small networks
working in a hierarchy: a fast "thinking" network and a
slower "conceptual" network that guided it.
By running this loop recursively, HRMs showed that
smaller models could indeed rival larger ones on logical
tasks. However, their two-network design was complex,
biologically metaphorical, and computationally tricky to
tune.
Source: Medium
The Breakthrough: TRMs
TRMs strip down the HRM concept to its elegant core.
They replace the two-network hierarchy with a single,
tiny network that operates in a recursive loop. This
network maintains two simple pieces of information:
y (the solution): The current best answer (e.g., a
Sudoku grid).
z (the reasoning state): A latent memory of its
current "thought process."
With each iteration, the network refines both y and z,
inching closer to the correct solution.
Source: Less is More: Recursive Reasoning with Tiny Networks
The Stunning Results
Tiny Model, Giant-Killing Performance
The empirical data is undeniable. On classic reasoning
benchmarks, a TRM with just ~7 million parameters
dramatically outperforms billion-parameter LLMs:
Sudoku-Extreme: TRM
achieved 87.4% accuracy,
while state-of-the-art
models like DeepSeek R1,
Claude 3, and o3-mini
scored a shocking 0%.
ARC-AGI Puzzles: On this
benchmark for fluid
intelligence, the TRM
scored 44.6%, nearly
tripling the performance of
DeepSeek R1 (15.8%) and
other large models.
Maze-Hard: For complex
pathfinding, TRM reached
85.3% accuracy,
significantly outperforming
its predecessor HRM
(74.5%).
Source: Less is More: Recursive Reasoning with Tiny Networks