Awesome Loop Models

🌐 Interactive Browser · 🧾 PR Submission Guide

Search, filter, and explore loop-model papers and selected technical blogs with links to arXiv, code, OpenReview, and more.

Use the PR Submission Guide to generate YAML for papers or blogs, then copy the path and YAML into your fork / branch for the final pull request step.

A curated list of papers and selected long-form technical blogs on Loop Models — architectures where, within a single forward process, a shared learned internal layer, block, module, or operator is reused.

News

2026-04-24 — Awesome Loop Models is released. Announcement

What Counts as a Loop Model?

This repository uses a strict definition:

By "loop model," we mean that, within a single forward pass of a model, a shared learned internal layer, block, module, or operator is reused.

This repo therefore includes papers that focus on loop models themselves, their mechanisms, applications, and designs. It excludes papers that are primarily about broader-scale iteration patterns that do not directly connect to loop models as defined above, such as agent loops, repeated full-model calls, external solver rounds, energy-based models, or plain sequence-time recurrence.

Admittedly, loop models are deeply connected to the broader field of architecture and algorithm design (Diffusion, Energy-Based Models, etc.). We also welcome work that explicitly connects adjacent topics to loop models.

Only the rightmost end of this scale is in scope for the main paper list.

How the Repository Is Organized

The public browsing layer uses exactly three flat paper categories:

Theoretical and Mechanical Analysis — analytical papers whose main reader takeaway is understanding: theory, mechanism analysis, probing, diagnostics, or formal properties
Architecture and Algorithm Designs — papers that propose loop-model architectures or algorithms, often for better performance, efficiency, training, inference, or memory use
Applications Focused — papers whose main reader takeaway is loop-model performance on concrete external domains or tasks, such as robotics, VLA, multimodal tasks, tabular data, or graph data

In addition, selected long-form technical posts live in a separate flat Blogs section. Blogs can carry tags, but they do not use the paper taxonomy.

The paper categories are intentionally coarse. Foundation status plus Loop Mechanism / focus / domain tags carry secondary structure without introducing a separate lineage-tag axis.

Top-level categories do the minimum amount of work. Finer distinctions are pushed into:

Loop Mechanism (mechanism_tags) — loop-form labels only: hierarchical-loop, flat-loop, parallel-loop, or implicit-layer
focus_tags — whether the paper mainly studies objective-loss, training-algorithm, architecture, data, or inference-algorithm
domain_tags — problem/domain labels such as language-modeling, robotics-vla, multimodal, tabular-data, or graph-data
tags — optional aliases or model identifiers kept in YAML / README metadata, such as DEQ, UT, ACT, or Ouro

A paper can also carry foundation: true as a secondary badge when it is a canonical anchor such as ACT, Universal Transformers, or DEQ. Foundation is no longer a separate top-level shelf.

In the interactive browser, the visible tag filters are Loop Mechanism, focus_tags, and domain_tags. Alias-style tags are not shown as browser filter chips there.

See TAGS.md for the current tag inventory and preferred spellings before proposing a new tag.

See TAXONOMY.md for the full inclusion rule, paper category definitions, tie-break rules, and the flat Blogs-section rule.

The paper shelves are intentionally coarse: Theoretical and Mechanical Analysis, Architecture and Algorithm Designs, and Applications Focused. Foundation status plus Loop Mechanism / focus / domain tags carry secondary structure without introducing lineage buckets. Blogs are a separate flat section: they can carry tags, but they do not use the paper taxonomy.

Theoretical and Mechanical Analysis

Theoretical and Mechanical Analysis collects papers whose primary contribution is analysis: why loop models work, what formal properties they have, and what mechanisms they exhibit.

[05/30/2026] Looped Transformers with Layer Normalization Provably Learn the Power Method

Authors: Lyumin Wu, Chenyang Zhang, Yuan Cao · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: theory · algorithmic-reasoning

TL;DR: Proves that a looped linear transformer with layer normalization, trained only for principal component prediction, converges to a solution implementing the power method, with each self-attention layer performing one power iteration.
[05/29/2026] Chain-of-Thought and Compressed Looped Transformers: A Memory-Budget Separation

Authors: Haozhou Zhang · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · reasoning · theory · memory-efficiency

TL;DR: Compares chain-of-thought scratchpads with compressed looped Transformers, arguing that looped hidden-state recurrence is bounded by its persistent memory budget even when more recurrent computation is applied.
[05/26/2026] Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models

Authors: Xiao-Wen Yang, Ziyu Han, Xi-Hua Zhang, Wen-Da Wei, Jie-Jing Shao, Lan-Zhe Guo, Yu-Feng Li · 2026

Loop Mechanism: flat-loop

Focus: training-algorithm · inference-algorithm

Domains: language-modeling · reasoning · theory · scaling

TL;DR: Analyzes why Looped Language Models can collapse at larger recurrence depths and proposes STARS, a spectral-radius-regularized training framework that pushes latent dynamics toward stable fixed points for reliable test-time scaling.
[05/20/2026] Interaction Locality in Hierarchical Recursive Reasoning

Authors: Yosuke Miyanishi, Tetsuro Morimura · 2026

Loop Mechanism: hierarchical-loop · flat-loop

Focus: architecture · inference-algorithm

Domains: reasoning · algorithmic-reasoning

TL;DR: Proposes interaction locality as a mechanistic measurement framework for HRM and TRM, showing how repeated recursive updates accumulate local writes into broader solution structure on grid reasoning benchmarks.
[05/18/2026] One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer

Authors: Jucheng Shen, Barbara Su, Anastasios Kyrillidis · 2026

Loop Mechanism: flat-loop · hierarchical-loop

Focus: architecture · inference-algorithm

Domains: reasoning · algorithmic-reasoning

TL;DR: Analyzes Asymmetric Input Recurrence, a two-state shared-weight recurrent Transformer where the same model updates L/H states, showing that state identity and input-injection asymmetry induce distinct proposal-vs-uncertainty roles on Sudoku-Extreme and Maze.
[05/08/2026] Bifurcation Models: Learning Set-Valued Solution Maps with Weight-Tied Dynamics

Authors: Caleb Jore, Jialin Liu · 2026

Loop Mechanism: flat-loop · implicit-layer

Focus: architecture · inference-algorithm

Domains: theory · algorithmic-reasoning

TL;DR: Studies weight-tied dynamics for set-valued solution maps, proving that regular equilibrium dynamics can represent multiple branches while repeated shared-operator iterations discover multiple valid equilibria on Ising and Allen-Cahn tasks.
[05/07/2026] Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models

Authors: Amir Rezaei Balef, Mykhailo Koshil, Katharina Eggensperger · ICML 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: tabular-data · reasoning

TL;DR: Analyzes layerwise inference dynamics in tabular foundation models and uses the observed depth redundancy to build a looped single-layer model that preserves comparable performance with about 20% of the original parameters.
[05/07/2026] Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

Authors: Chenyang Zhang, Yuan Cao · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm · training-algorithm

Domains: theory · reasoning

TL;DR: Proves that softmax transformers can implement in-context logistic regression by treating layers as normalized-gradient-descent steps, then trains one self-attention layer and applies it recurrently as a looped model with convergence and OOD guarantees.
[04/28/2026] On Halting vs Converging in Recurrent Graph Neural Networks

Authors: Jeroen Bollen, Stijn Vansummeren · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: theory · algorithmic-reasoning

TL;DR: Analyzes recurrent graph neural networks that repeatedly apply message passing until convergence or halting, proving expressiveness relationships between converging, output-converging, and halting RGNN variants.
[04/23/2026] Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

Authors: Grigory Sapunov · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: reasoning · algorithmic-reasoning

TL;DR: Studies a single-block Universal Transformer with ACT on Sudoku-Extreme, showing that learned memory tokens are necessary for non-trivial recursive-depth reasoning and that ACT initialization can trap the model in shallow computation.
[04/22/2026] How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models

Authors: Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis · 2026

Loop Mechanism: flat-loop

Focus: architecture

Domains: language-modeling · scaling · efficiency

TL;DR: Measures the parameter value of recurrence in looped language models with iso-depth scaling laws, estimating how extra recurrent passes trade off against unique depth and training compute.
[04/16/2026] Stability and Generalization in Looped Transformers

Authors: Asher Labovich · 2026

Loop Mechanism: flat-loop · implicit-layer

Focus: inference-algorithm

Domains: reasoning · theory

TL;DR: Analyzes stability and generalization in looped transformers through a fixed-point framework, characterizing when recall and normalization yield reachable, input-dependent, and trainable loop dynamics.
[04/15/2026] Hierarchical vs. Flat Iteration in Shared-Weight Transformers

Authors: Sang-Il Han · 2026

Loop Mechanism: flat-loop · hierarchical-loop

Focus: architecture

Domains: language-modeling · scaling

TL;DR: Empirically compares hierarchical shared-weight recurrence against flat shared-weight iteration and independent-layer stacking, revealing a persistent representational gap for the recurrent hierarchy.
[04/13/2026] A Mechanistic Analysis of Looped Reasoning Language Models

Authors: Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong · 2026

Loop Mechanism: implicit-layer

Focus: inference-algorithm

Domains: language-modeling · reasoning

TL;DR: Analyzes looped reasoning LLMs mechanistically, showing recurrent cycles converge to layer-specific fixed points and that feedforward-like inference stages repeat across latent recurrences.
[04/10/2026] Relational Preference Encoding in Looped Transformer Internal States

Authors: Jan Kirin · 2026

Loop Mechanism: flat-loop

Focus: training-algorithm · architecture

Domains: language-modeling · alignment

TL;DR: Probes looped transformer hidden states during iterative refinement, showing that human-preference information is encoded primarily in relational differences between loop states rather than independent per-state scores.
[04/09/2026] Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers

Authors: Harsh Kohli, Srinivasan Parthasarathy, Huan Sun, Yuekun Yao · 2026

Loop Mechanism: flat-loop · implicit-layer

Focus: inference-algorithm

Domains: language-modeling · reasoning

TL;DR: Studies implicit reasoning in recurrent-depth transformers, showing that iterating shared transformer layers can unlock systematic generalization and depth extrapolation while also exposing overthinking limits.
[02/05/2026] Inverse Depth Scaling From Most Layers Being Similar

Authors: Yizhou Liu, Sara Kangaslahti, Ziming Liu, Jeff Gore · 2026

Loop Mechanism: flat-loop

Focus: architecture

Domains: language-modeling · theory

Community Comments: X Comment

TL;DR: Analyzes LLMs and toy residual networks to show loss scales inversely with depth when many layers are functionally similar and primarily reduce error via ensemble averaging.
[09/27/2025] Two-Scale Latent Dynamics for Recurrent-Depth Transformers

Authors: Francesco Pappone, Donato Crisostomi, Emanuele Rodolà · 2025

Loop Mechanism: flat-loop

Focus: inference-algorithm

Domains: language-modeling · reasoning

TL;DR: Analyzes recurrent-depth transformers through a two-scale latent-dynamics lens, showing shrinking and increasingly orthogonal loop updates and deriving a second-order early-exit criterion that improves latency-quality trade-offs.
[07/02/2025] Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer

Authors: Wenquan Lu, Yuechuan Yang, Kyle Lee, Yanshu Li, Enqi Liu · 2025

Loop Mechanism: flat-loop

Focus: inference-algorithm

Domains: language-modeling · reasoning

TL;DR: Probes a depth-recurrent Transformer to test whether latent chain-of-thought structure emerges across recurrence steps, finding limited evidence and recurrence-depth-dependent interpretability effects.
[02/24/2025] Reasoning with Latent Thoughts: On the Power of Looped Transformers

Authors: Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, Sashank J. Reddi · ICLR 2025

Loop Mechanism: flat-loop

Focus: training-algorithm · inference-algorithm

Domains: language-modeling · reasoning

Community Comments: Reza Bayat reading list (#7)

TL;DR: Studies looped transformers as reasoning models, showing effective-depth scaling, latent-thought simulation of chain-of-thought, and a looping-based regularizer that improves the reasoning-versus-memorization trade-off.
[10/02/2024] On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding

Authors: Kevin Xu, Issei Sato · 2024

Loop Mechanism: flat-loop

Focus: architecture

Domains: language-modeling · reasoning · theory

TL;DR: Analyzes the expressive power of looped transformers, derives approximation-rate limits, and shows that timestep encoding improves their function-approximation behavior.
[11/21/2023] Looped Transformers are Better at Learning Learning Algorithms

Authors: Liu Yang, Kangwook Lee, Robert Nowak, Dimitris Papailiopoulos · ICLR 2024

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: algorithmic-reasoning

Community Comments: Benhao's reading note Reza Bayat reading list (#5)

TL;DR: Proposes looped-transformer training for in-context data-fitting tasks, showing comparable performance to standard transformers with under 10% of the parameters by better matching iterative learning algorithms.
[01/30/2023] Looped Transformers as Programmable Computers

Authors: Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos · 2023

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: algorithmic-reasoning

Community Comments: Reza Bayat reading list (#4)

TL;DR: Shows that a shallow looped transformer can emulate instruction-set computation and iterative algorithms such as SGD or matrix inversion, with the recurrence acting as a reusable program counter.

Architecture and Algorithm Designs

Architecture and Algorithm Designs collects the constructive side of the field: new looped architectures, algorithms, recurrent computation graphs, and efficiency or memory-compression methods.

[06/03/2026] LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling

Authors: Wenkai Chen, Tianshu Li, Wenyong Huang, Yichun Yin, Lifeng Shang, Chengwei Qin · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: language-modeling · efficiency · scaling

TL;DR: Introduces LoopMoE, a looped mixture-of-experts language model that combines sparse routing with iterative weight-shared computation through iteration-conditioned modulation and capacity balancing.
[05/31/2026] CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability

Authors: Chad A. Capps · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: language-modeling · efficiency · scaling

TL;DR: Introduces a compact language model that reuses a single shared transformer core across depth while anchoring recurrence to precomputed key-value tensors and reports a mostly negative parameter-parity result against dense baselines.
[05/29/2026] Fixed-Point Masked Generative Modeling

Authors: Andrea Miele, Yiming Qin, Alba Carballo-Castro, Justin Deschenaux, Pascal Frossard · 2026

Loop Mechanism: implicit-layer

Focus: architecture · training-algorithm · inference-algorithm

Domains: language-modeling · vision · efficiency

TL;DR: Replaces part of a masked generative model denoiser with a fixed-point solver over shared attention layers, using consistency training and solver-state reuse to adapt depth with fewer parameters.
[05/27/2026] CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

Authors: Venkat Akhil Lakkapragada · 2026

Loop Mechanism: hierarchical-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · reasoning · efficiency

TL;DR: Explores a compact autoregressive language model with a Hierarchical Reasoning Module that iterates through high-level and low-level reasoning cycles and learns input-dependent halting behavior for adaptive reasoning depth.
[05/25/2026] Looped Diffusion Language Models

Authors: Sanghyun Lee, Chunsan Hong, Seungryong Kim, Jonghyun Lee, Jongho Park, Dongmin Park · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: language-modeling · reasoning · efficiency · scaling

TL;DR: Introduces LoopMDM, selectively looping early-middle transformer layers in masked diffusion language models so training gains depth-scaling without extra parameters and inference can vary loop count for compute scaling.
[05/22/2026] Training-Free Looped Transformers

Authors: Lizhang Chen, Jonathan Li, Chen Liang, Ni Lao, Qiang Liu · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · reasoning · efficiency · scaling

TL;DR: Retrofits frozen pretrained transformers with a training-free inference wrapper that repeatedly applies a contiguous mid-stack layer block as damped refinement sub-steps, improving several QA and reasoning benchmarks without fine-tuning.
[05/20/2026] Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

Authors: Benhao Huang, Zhengyang Geng, Zico Kolter · ICML 2026

Loop Mechanism: flat-loop · implicit-layer

Focus: architecture · inference-algorithm

Domains: reasoning · algorithmic-reasoning · scaling

TL;DR: Formalizes Equilibrium Reasoners as learned latent dynamical systems whose repeated update rule converges toward task-conditioned attractors, enabling depth and breadth test-time scaling for reasoning.
[05/20/2026] LT2: Linear-Time Looped Transformers

Authors: Chunyuan Deng, Yizhe Zhang, Rui-Jie Zhu, Yuanyuan Xu, Jiarui Liu, T. S. Eugene Ng, Hanjie Chen · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · reasoning · efficiency · scaling

TL;DR: Introduces LT2, a looped-transformer family that replaces quadratic attention with linear or sparse attention so repeated loop steps refine memory and expand receptive field while keeping inference more scalable.
[05/19/2026] Generative Recursive Reasoning

Authors: Junyeob Baek, Mingyu Jo, Minsu Kim, Mengye Ren, Yoshua Bengio, Sungjin Ahn · 2026

Loop Mechanism: flat-loop · parallel-loop

Focus: architecture · objective-loss · training-algorithm · inference-algorithm

Domains: reasoning · algorithmic-reasoning

TL;DR: Introduces GRAM, a probabilistic recursive-reasoning framework that models reasoning as stochastic latent trajectories, enabling multi-hypothesis computation, variational training, and inference-time scaling through depth and parallel sampling.
[05/19/2026] Probabilistic Tiny Recursive Model

Authors: Amin Sghaier, Ali Parviz, Alexia Jolicoeur-Martineau · 2026

Loop Mechanism: hierarchical-loop · flat-loop · parallel-loop

Focus: inference-algorithm

Domains: reasoning · algorithmic-reasoning · efficiency

TL;DR: Introduces PTRM, an inference-time scaling framework for Tiny Recursive Models that injects Gaussian noise into recursive latent updates, runs parallel trajectories, and selects the final answer with the model's Q head without retraining.
[05/18/2026] HRM-Text: Efficient Pretraining Beyond Scaling

Authors: Guan Wang, Changling Liu, Chenyu Wang, Cai Zhou, Yuhao Sun, Yifei Wu, Shuai Zhen, Luca Scimeca, Yasin Abbasi Yadkori · Preprint 2026

Loop Mechanism: hierarchical-loop · flat-loop

Focus: architecture · training-algorithm · objective-loss · data

Domains: language-modeling · reasoning · efficiency

TL;DR: Introduces HRM-Text, a 1B Hierarchical Recurrent Model language model that combines dual-timescale recurrent Transformer modules with MagicNorm, warmup deep credit assignment, PrefixLM masking, and task-completion pretraining for efficient training from 40B unique tokens.
[05/15/2026] Looped SSMs: Depth-Recurrence and Input Reshaping for Time Series Classification

Authors: Mónika Farsang, Ramin Hasani, Daniela Rus, Radu Grosu · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: sequence-modeling · efficiency · scaling

TL;DR: Extends looped-transformer depth recurrence to state-space models by reusing the same SSM block across depth and adding input reshaping, showing tied-depth SSMs match or beat untied SSMs on six time-series benchmarks despite fewer parameters.
[05/12/2026] Solve the Loop: Attractor Models for Language and Reasoning

Authors: Jacob Fein-Ashley, Paria Rashidinejad · 2026

Loop Mechanism: flat-loop · implicit-layer

Focus: architecture · training-algorithm · inference-algorithm

Domains: language-modeling · reasoning · scaling · efficiency

TL;DR: Introduces Attractor Models, where a backbone proposes output embeddings and an attractor module iteratively solves a fixed point with implicit differentiation, improving looped language modeling and small-model reasoning while allowing adaptive convergence-depth inference.
[05/11/2026] Simply Stabilizing the Loop via Fully Looped Transformer

Authors: Rao Fu, Zixuan Yang, Jiankun Zhang, Jing Ma, Hechang Chen, Yu Li, Yi Chang · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: language-modeling · reasoning · scaling · efficiency

TL;DR: Stabilizes looped transformers with parameter-free fully looped signal routing and attention injection, enabling stable training at higher loop counts while preserving test-time loop-depth control.
[05/10/2026] LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

Authors: Taekhyun Park, Yongjae Lee, Dohee Kim, Hyerim Bae · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: language-modeling · reasoning · efficiency · scaling

TL;DR: Converts pretrained LLMs into encoder, looped reasoning block, and decoder components, using selective gating, random deep supervision, and adaptive early exiting to stabilize latent looping without training recurrent models from scratch.
[05/09/2026] Quantum Injection Pathways for Implicit Graph Neural Networks

Authors: Pengyuan Xu, Tristan Zaborniak, Luis F. Rivera, Hausi A. Müller · 2026

Loop Mechanism: implicit-layer

Focus: architecture · inference-algorithm

Domains: theory · efficiency

TL;DR: Formulates quantum-signal injection pathways for graph deep-equilibrium models, comparing fixed, state-dependent, and backbone-dependent coupling inside the fixed-point operator with contraction guarantees and graph-classification experiments.
[05/09/2026] Sparse Layers are Critical to Scaling Looped Language Models

Authors: Ryan Lee, Jacob Biloki, Edward J. Hu, Jonathan May · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · scaling · efficiency · MoE

TL;DR: Shows that MoE-style sparse layers can make looped language models scale better than dense looped transformers, with routing divergence across repeated shared layers recovering expressivity and loop boundaries serving as effective early-exit points.
[05/08/2026] Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

Authors: Victor Conchello Vendrell, Arnau Padres Masdemont, Niccolò Grillo, Jordi Ros-Giralt, Arash Behboodi, Fabio Valerio Massoli · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: language-modeling · reasoning · efficiency · memory-efficiency

TL;DR: Memory-Efficient Looped Transformer enables constant‑memory iterative reasoning by sharing a single KV cache across loops, achieving strong performance without the linear memory scaling of prior looped LLMs.
[04/23/2026] Hyperloop Transformers

Authors: Abbas Zeitoun, Lucas Torroba-Hennigen, Yoon Kim · 2026

Loop Mechanism: flat-loop

Focus: architecture

Domains: language-modeling · efficiency · memory-efficiency

Community Comments: Turing Posts

TL;DR: Introduces Hyperloop Transformers, a parameter-efficient looped Transformer that applies only a middle block recurrently and adds hyper-connections between loops to improve memory-efficient language modeling.
[04/20/2026] One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

Authors: Chris Cameron, Wangzheng Wang, Nikita Ivanov, Ashmita Bhattacharyya, Didier Chételat, Yingxue Zhang · 2026

Loop Mechanism: flat-loop

Focus: training-algorithm · inference-algorithm · architecture

Domains: reasoning · algorithmic-reasoning

TL;DR: Introduces Denoising Recursion Models, a looped-transformer training method that corrupts targets and trains recursive refinement over multiple steps, improving ARC-AGI reasoning over TRM.
[04/19/2026] LASER: Low-Rank Activation SVD for Efficient Recursion

Authors: Ege Çakar, Ketan Ali Raghu, Lia Zheng · 2026

Loop Mechanism: hierarchical-loop

Focus: architecture · inference-algorithm

Domains: efficiency

TL;DR: Analyzes Tiny Recursive Model activation geometry during recursive unrolling and introduces LASER, a dynamic low-rank activation compression method that cuts recursive activation memory by ~60% without statistically significant accuracy loss.
[04/14/2026] 🌟 Parcae: Scaling Laws For Stable Looped Language Models

Authors: Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, Daniel Y. Fu · 2026

Loop Mechanism: flat-loop

Focus: objective-loss · architecture

Domains: language-modeling · reasoning

Community Comments: Benhao's reading note

TL;DR: Introduces Parcae, a stable looped language model that constrains injection spectral norms to prevent instability and studies isoFLOPs-style training- and test-time scaling laws for quality gains under fixed-parameter budgets.
[04/10/2026] ELT: Elastic Looped Transformers for Visual Generation

Authors: Sahil Goyal, Swayam Agrawal, Gautham Govind Anil, Prateek Jain, Sujoy Paul, Aditya Kusupati · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: vision · efficiency

Community Comments: Tweet by Grigory Sapunov Grigory Sapunov's reading notes

TL;DR: Introduces Elastic Looped Transformers for image and video generation, using weight-shared recurrent transformer blocks plus Intra-Loop Self Distillation to support any-time inference with dynamic quality-compute trade-offs from a single training run.
[03/23/2026] Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization

Authors: Hung-Hsuan Chen · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: reasoning · compositional-reasoning

TL;DR: Introduces a depth-recurrent Transformer for compositional generalization, with silent thinking, LayerScale, and identity-biased recurrence enabling stable deep latent iteration.
[03/20/2026] LoopRPT: Reinforcement Pre-Training for Looped Language Models

Authors: Guo Tang, Shixin Jiang, Heng Chang, Nuo Chen, Yuhan Li, Huiming Fan, Jia Li, Ming Liu, Bing Qin · 2026

Loop Mechanism: flat-loop

Focus: objective-loss · training-algorithm

Domains: language-modeling · reasoning · RL

TL;DR: Proposes LoopRPT, a reinforcement pre-training method for looped language models that assigns learning signals to latent iterations, improving accuracy-compute trade-offs and strengthening early-stage reasoning on Ouro.
[03/09/2026] Adaptive Loops and Memory in Transformers: Think Harder or Know More?

Authors: Markus Frey, Behzad Shomali, Ali Hamza Bashir, David Berghaus, Joachim Koehler, Mehdi Ali · 2026

Loop Mechanism: flat-loop

Focus: architecture

Domains: language-modeling · reasoning · efficiency

TL;DR: Introduces transformers with adaptive per-layer looping and gated memory banks, showing that combining learned halting with extra storage improves reasoning under matched parameter and FLOP budgets.
[03/09/2026] Tiny Autoregressive Recursive Models

Authors: Paulius Rauba, Claudio Fanconi, Mihaela van der Schaar · 2026

Loop Mechanism: hierarchical-loop · flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: algorithmic-reasoning · language-modeling

Community Comments: Benhao's reading note

TL;DR: Studies autoregressive Tiny Recursive Models under compute-matched baselines, finding that simple two-step refinement helps on small algorithmic tasks while the full Autoregressive TRM shows no reliable gains.
[03/05/2026] Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation

Authors: Yilong Chen, Naibin Gu, Junyuan Shang, Zhenyu Zhang, Yuchen Feng, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang · 2026

Loop Mechanism: flat-loop

Focus: objective-loss · architecture · inference-algorithm

Domains: language-modeling · efficiency · MoE

Community Comments: Benhao's reading note

TL;DR: Proposes MOUE, which reuses a universal layer-agnostic expert pool across layers to transform depth into virtual width and improve MoE performance under fixed activation budgets.
[03/05/2026] Recursive Inference Machines for Neural Reasoning

Authors: Mieszko Komisarczyk, Saurabh Mathur, Maurice Kraus, Sriraam Natarajan, Kristian Kersting · 2026

Loop Mechanism: hierarchical-loop

Focus: architecture · inference-algorithm

Domains: reasoning · RL

Community Comments: Benhao's reading note

TL;DR: Introduces Recursive Inference Machines, a recurrent reasoning framework that casts TRMs as a special case and improves ARC-AGI, Sudoku, and tabular classification by reweighting the history of loop states.
[03/02/2026] AdaPonderLM: Gated Pondering Language Models with Token-Wise Adaptive Depth

Authors: Shixiang Song, He Li, Zitong Wang, Boyi Zeng, Feichen Song, Yixuan Wang, Zhiqin John Xu, Ziwei He, Zhouhan Lin · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: language-modeling · reasoning · efficiency

TL;DR: Introduces AdaPonderLM, a self-supervised recurrent language model with token-wise halting gates and KV reuse, allocating more loop steps to hard tokens under a fixed compute budget.
[02/12/2026] SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion

Authors: Chengting Yu, Xiaobo Shu, Yadao Wang, Yizhen Zhang, Haoyi Wu, You Wu, Rujiao Long, Ziheng Chen, Yuchi Xu, Wenbo Su, Bo Zheng · 2026

Loop Mechanism: hierarchical-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · reasoning

TL;DR: Introduces SpiralFormer, a looped transformer that applies shared layers under a multi-resolution recursion schedule to learn hierarchical dependencies more efficiently than fixed-resolution recurrent baselines.
[02/11/2026] LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation

Authors: Ahmadreza Jeddi, Marco Ciccone, Babak Taati · ICLR 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: language-modeling · reasoning · efficiency

TL;DR: Introduces LoopFormer, trained on variable-length trajectories to enable budget-conditioned reasoning. Uses shortcut-consistency regularization to ensure stable internal trajectories across different loop depths.
[02/11/2026] Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models

Authors: Jonathan Williams, Esin Tureci · 2026

Loop Mechanism: flat-loop

Focus: objective-loss · training-algorithm

Domains: language-modeling · reasoning

TL;DR: Introduces RLTT, a reinforcement-learning objective that assigns reward across the full latent thought trajectory of looped language models rather than only the final latent state.
[02/09/2026] Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models

Authors: Ruihan Xu, Yuting Gao, Lan Wang, Jianing Li, Weihao Chen, Qingpei Guo, Ming Yang, Shiliang Zhang · 2026

Loop Mechanism: hierarchical-loop

Focus: architecture · inference-algorithm

Domains: vision · efficiency

TL;DR: Introduces RecursiveVLM, a recursive multimodal transformer with a recursive connector and monotonic recursion loss that enables on-demand extra refinement under varying compute budgets.
[02/09/2026] Understanding Dynamic Compute Allocation in Recurrent Transformers

Authors: Ibraheem Muhammad Moosa, Suhas Lohit, Ye Wang, Moitreya Chatterjee, Wenpeng Yin · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · algorithmic-reasoning · efficiency

Community Comments: Benhao's reading note

TL;DR: Proposes ANIRA, a recurrent Transformer framework for per-token variable-depth computation, and shows adaptive compute can align with token complexity while failing to extrapolate to longer algorithmic inputs.
[01/29/2026] Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves

Authors: Jonas Knupp, Jan Hendrik Metzen, Jeremias Bohn, Georg Groh, Kristian Kersting · 2026

Loop Mechanism: parallel-loop · flat-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · reasoning · efficiency

Community Comments: Benhao's reading note

TL;DR: Introduces a modular framework combining sequence attention and depth attention for recurrent-depth models, improving FLOP-, parameter-, and memory-efficiency simultaneously.
[01/26/2026] ChainGPT: Dual-Reasoning Model with Recurrent Depth and Multi-Rank State Updates

Authors: Yunao Zheng, Xiaojie Wang, Lei Ren, Chen Wei · ICLR 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: language-modeling · reasoning

TL;DR: Introduces ChainGPT, a dual-reasoning recurrent-depth architecture that combines multi-substep state updates and state-guided sparse attention to move reasoning into latent computation, with adaptive stopping as a supporting mechanism.
[01/26/2026] MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning

Authors: Xiaojing Zhang, Haifeng Wu, Gang He, Jiyang Shen, Bochen Lyu, Zhanxing Zhu · ICLR 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm · training-algorithm

Domains: language-modeling · reasoning · efficiency · MoE

TL;DR: Introduces MoDr, which adds multi-branch routing to a depth-recurrent Transformer so looped models can explore solution paths more adaptively at test time.
[12/16/2025] Universal Reasoning Model

Authors: Zitian Gao, Lynx Chen, Yihao Xiao, He Xing, Ran Tao, Haoming Luo, Joey Zhou, Bryan Dai · 2025

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: algorithmic-reasoning · reasoning

TL;DR: Proposes URM, a Universal Transformer-based architecture with weight tying that beats standard transformers on reasoning benchmarks through iterative depth computation.
[11/11/2025] Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Authors: Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang · 2025

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · reasoning · efficiency

TL;DR: Introduces Think-at-Hard, a dynamic latent-thinking method that uses a learned decider to apply extra recurrent latent iterations only to hard tokens, with LoRA refiners and duo-causal attention across iteration depth.
[11/10/2025] Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Authors: Sean McLeish, Ang Li, John Kirchenbauer, Dayal Singh Kalra, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Jonas Geiping, Tom Goldstein, Micah Goldblum · 2025

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: efficiency · language-modeling · reasoning

TL;DR: A framework for retrofitting pretrained feedforward language models with depth recurrence, improving training efficiency for depth-recurrent models and enabling greater FLOP efficiency than comparable feedforward models.
[10/29/2025] 🌟 Scaling Latent Reasoning via Looped Language Models

Authors: Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, Lu Li, Jiajun Shi, Kaijing Ma, Shanda Li, Taylor Kergan, Andrew Smith, Xingwei Qu, Mude Hui, Bohong Wu, Qiyang Min, Hongzhi Huang, Xun Zhou, Wei Ye, Jiaheng Liu, Jian Yang, Yunfeng Shi, Chenghua Lin, Enduo Zhao, Tianle Cai, Ge Zhang, Wenhao Huang, Yoshua Bengio, Jason Eshraghian · 2025

Loop Mechanism: flat-loop

Focus: objective-loss · architecture · data · training-algorithm · inference-algorithm

Domains: language-modeling · reasoning

Community Comments: Reza Bayat reading list (#10)

TL;DR: Introduces Ouro, a family of pre-trained Looped Language Models (1.4B and 2.6B) that match the performance of 12B standard LLMs. Establishes loop depth as a third scaling axis beyond model size and data.
[10/28/2025] Parallel Loop Transformer for Efficient Test-Time Computation Scaling

Authors: Bohong Wu, Mengzhao Chen, Xiang Luo, Shen Yan, Qifan Yu, Fan Xia, Tianqi Zhang, Hongrui Zhan, Zheng Zhong, Xun Zhou, Siyuan Qiao, Xingyan Bin · 2025

Loop Mechanism: parallel-loop · flat-loop

Focus: inference-algorithm

Domains: language-modeling · reasoning · efficiency

TL;DR: Introduces the Parallel Loop Transformer, which preserves looped-model accuracy while reducing latency and memory through cross-loop parallelism and shared-loop KV representations.
[10/06/2025] Less is More: Recursive Reasoning with Tiny Networks

Authors: Alexia Jolicoeur-Martineau · 2025

Loop Mechanism: hierarchical-loop · flat-loop

Focus: architecture · inference-algorithm · training-algorithm

Domains: reasoning

TL;DR: Proposes Tiny Recursive Model (TRM), a single tiny network that recursively refines latent state and answer over multiple improvement steps, outperforming HRM and many larger models on ARC-AGI-style reasoning tasks.
[10/03/2025] Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner

Authors: Cai Zhou, Chenxiao Yang, Yi Hu, Chenyu Wang, Chubin Zhang, Muhan Zhang, Lester Mackey, Tommi Jaakkola, Stephen Bates, Dinghuai Zhang · 2025

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: language-modeling · reasoning

TL;DR: Proposes Coevolutionary Continuous Discrete Diffusion, a joint continuous-discrete diffusion language model that repeatedly denoises latent and token states with one time-conditioned model, linking diffusion sampling to latent reasoning and looped-transformer expressivity.
[07/14/2025] Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Authors: Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, Se-Young Yun · 2025

Loop Mechanism: hierarchical-loop

Focus: architecture · inference-algorithm · training-algorithm

Domains: language-modeling · reasoning · efficiency

Community Comments: Reza Bayat reading list (#12)

TL;DR: Introduces Mixture-of-Recursions, a recursive transformer with token-level routing that adapts recursion depth and active-token attention so easy tokens exit early while hard tokens keep thinking.
[07/10/2025] Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Authors: Ziyue Li, Yang Li, Tianyi Zhou · 2025

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · reasoning

Community Comments: X Comment

TL;DR: Proposes Chain-of-Layers (CoLa), an inference-time search method that skips or repeats pretrained LLM layers per sample via MCTS to improve efficiency and reasoning accuracy.
[06/26/2025] Hierarchical Reasoning Model

Authors: Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori · 2025

Loop Mechanism: hierarchical-loop · flat-loop

Focus: architecture · training-algorithm

Domains: reasoning · algorithmic-reasoning

TL;DR: Proposes HRM, a brain-inspired recurrent architecture with two coupled modules at different timescales: a high-level module for abstract planning and a low-level module for detailed execution.
[02/10/2025] Implicit Language Models are RNNs: Balancing Parallelization and Expressivity

Authors: Mark Schöne, Babak Rahmani, Heiner Kremer, Fabian Falck, Hitesh Ballani, Jannes Gladrow · ICML 2025

Loop Mechanism: implicit-layer

Focus: architecture · inference-algorithm

Domains: language-modeling · reasoning

TL;DR: Introduces implicit state-space language models that iterate a shared transition toward a fixed point, recovering RNN-like expressivity while retaining mostly parallel training.
[02/07/2025] 🌟 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Authors: Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein · NeurIPS 2025

Loop Mechanism: flat-loop

Focus: architecture

Domains: language-modeling · reasoning

Community Comments: Reza Bayat reading list (#9)

TL;DR: Presents Huginn, a recurrent-depth transformer (3.5B params) that iterates a single block up to 64 times per token, achieving strong reasoning performance that scales with additional test-time compute.
[10/28/2024] Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Authors: Sangmin Bae, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Seungyeon Kim, Tal Schuster · 2025

Loop Mechanism: hierarchical-loop

Focus: architecture · training-algorithm

Domains: language-modeling

Community Comments: Reza Bayat reading list (#11)

TL;DR: Presents Relaxed Recursive Transformers as a parameter-sharing conversion and uptraining recipe that turns pretrained LLMs into compact recursive models using layer tying and layer-wise LoRA while preserving performance and improving deployment efficiency.
[05/25/2024] MoEUT: Mixture-of-Experts Universal Transformers

Authors: Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber, Christopher Potts, Christopher D. Manning · 2024

Loop Mechanism: flat-loop · hierarchical-loop

Focus: architecture · training-algorithm

Domains: language-modeling · reasoning · efficiency · MoE

TL;DR: Introduces MoEUT, a mixture-of-experts Universal Transformer that combines shared recurrent depth with expert routing to improve language modeling while using less compute and memory than comparable baselines.
[02/21/2024] AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures

Authors: Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael K. Ng, Zhenguo Li, Zhaoqiang Liu · 2024

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: language-modeling · algorithmic-reasoning

TL;DR: Splits computation into pre-, loop-, and post-transformer stages, showing that structured recurrent depth can outperform standard and vanilla looped transformers on algorithmic and language tasks.
[10/16/2023] CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference

Authors: Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi · 2023

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · reasoning

TL;DR: Recasts chain-of-thought as recurrent depth inside a token-level transformer, using token-wise adaptive computation to spend extra iterations only where additional reasoning budget helps.
[09/22/2022] A Generalist Neural Algorithmic Learner

Authors: Borja Ibarz, Vitaly Kurin, George Papamakarios, Kyriacos Nikiforou, Mehdi Bennani, Róbert Csordás, Andrew Dudzik, Matko Bošnjak, Alex Vitvitskyi, Yulia Rubanova, Andreea Deac, Beatrice Bevilacqua, Yaroslav Ganin, Charles Blundell, Petar Veličković · LoG 2022

Loop Mechanism: flat-loop

Focus: architecture · data

Domains: algorithmic-reasoning

TL;DR: Presents a single GNN model trained on 30+ algorithms from the CLRS benchmark, demonstrating that a shared recurrent architecture can generalize across diverse algorithmic tasks.
[11/09/2021] On Training Implicit Models

Authors: Zhengyang Geng, Xin-Yu Zhang, Shaojie Bai, Yisen Wang, Zhouchen Lin · NeurIPS 2021

Loop Mechanism: implicit-layer

Focus: training-algorithm

Domains: efficiency

TL;DR: Proposes phantom gradient, a lightweight backpropagation estimator for implicit (infinite-depth) models that uses damped unrolling and a truncated Neumann series to speed backward passes while matching or surpassing exact-gradient baselines on large-scale tasks.
[06/15/2020] Multiscale Deep Equilibrium Models

Authors: Shaojie Bai, Vladlen Koltun, J. Zico Kolter · NeurIPS 2020

Loop Mechanism: implicit-layer · hierarchical-loop

Focus: architecture

Domains: vision

TL;DR: Extends DEQ to multiscale hierarchical representations, achieving competitive performance on large-scale vision tasks.
[09/03/2019] Deep Equilibrium Models

Authors: Shaojie Bai, J. Zico Kolter, Vladlen Koltun · NeurIPS 2019

Loop Mechanism: implicit-layer

Focus: architecture · training-algorithm · inference-algorithm

TL;DR: Proposes to directly solve for the fixed point of an infinite-depth network, enabling implicit-depth models that are memory-efficient and theoretically equivalent to infinitely deep recurrent networks.
[07/10/2018] Universal Transformers

Authors: Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser · ICLR 2019

Loop Mechanism: flat-loop

Focus: architecture

Domains: language-modeling · algorithmic-reasoning

Community Comments: Reza Bayat reading list (#3)

TL;DR: Extends the standard Transformer with recurrent computation over depth via weight tying, enabling Turing-complete computation and combining the parallelism of Transformers with the inductive bias of RNNs.
[03/29/2016] Adaptive Computation Time for Recurrent Neural Networks

Authors: Alex Graves · 2016

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: sequence-modeling · efficiency

TL;DR: Introduces ACT, allowing RNNs to learn how many computational steps to take per input, laying the groundwork for dynamic-depth recurrent computation.
[11/25/2015] Neural GPUs Learn Algorithms

Authors: Łukasz Kaiser, Ilya Sutskever · ICLR 2016

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: algorithmic-reasoning

TL;DR: Introduces Neural GPUs, a recurrent convolutional architecture that learns parallel algorithms like addition and multiplication through repeated application of a shared convolutional recurrent block.

Applications Focused

Applications Focused collects papers centered on applying loop models to concrete domains or tasks, including robotics, VLA, multimodal settings, tabular data, graph data, and other non-core benchmarks.

[06/03/2026] Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

Authors: Yacouba Kaloga, Shashi Kumar, Shakeel A. Sheikh, Driss Khalil, Petr Motlicek, Ina Kodrasi · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: speech-recognition · efficiency · scaling

TL;DR: Introduces LARM, a depth-conditioned looped Transformer for automatic speech recognition that reuses a shared acoustic-encoder block recurrently and scales recognition quality by increasing inference-time loop count.
[05/28/2026] Déjà View: Looping Transformers for Multi-View 3D Reconstruction

Authors: Alessandro Burzio, Tobias Fischer, Sven Elflein, Qunjie Zhou, Riccardo de Lutio, Jiawei Ren, Jiahui Huang, Shengyu Huang, Marc Pollefeys, Laura Leal-Taixé, Zan Gojcic, Haithem Turki · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: vision · efficiency

TL;DR: Applies a single looped transformer block recurrently to per-view features for a variable number of refinement steps in multi-view 3D reconstruction, exposing loop count as an inference-time compute knob.
[05/27/2026] Recursive Vision Transformer with Dynamic Depth and Width Adjustment for Resource-Efficient Image Semantic Communication

Authors: Zhilong Zhang, Xinhui Zhang, Gongyu Jin, Sihua Wang, Danpu Liu, Changchuan Yin · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: vision · efficiency

TL;DR: Uses a recursive ViT structure to iteratively refine semantic features for image semantic communication while dynamically adjusting recursive depth and width under image and channel conditions.
[05/19/2026] i-DEQ: A stable inertial deep equilibrium model for image restoration

Authors: Antonin Clerc, Marien Renaud, Baudouin Denis De Seneville, Nicolas Papadakis · 2026

Loop Mechanism: implicit-layer

Focus: architecture · inference-algorithm · training-algorithm

Domains: vision · efficiency

TL;DR: Introduces i-DEQ, an inertial deep-equilibrium image-restoration model that learns explicit nonconvex regularization and uses momentum in fixed-point iterations, improving stability and robustness while roughly halving DEQ inference time.
[05/19/2026] Nonlocal operator learning for fMRI encoding and decoding tasks

Authors: Andreas Kramer, Saugat Acharya, Alice Giola, Emanuele Zappala · 2026

Loop Mechanism: implicit-layer

Focus: architecture · inference-algorithm

Domains: sequence-modeling

TL;DR: Applies a latent neural integral-operator model to fMRI encoding and decoding, using fixed-point iterations in an auxiliary latent space before downstream classification or stimulus prediction.
[05/18/2026] PERL: Parameter Efficient Reasoning in CLIP Latent Space

Authors: Simone Carnemolla, Salvatore Calcagno, Daniela Giordano, Concetto Spampinato, Matteo Pennisi · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: vision · reasoning · efficiency

TL;DR: Introduces PERL, a few-shot CLIP adaptation framework that reuses a compact shared reasoning module across latent refinement steps, improving base-to-novel, transfer, and OOD results with about 6K trainable parameters.
[05/12/2026] Recurrent Transformer-Based Near- and Far-Field THz Wideband Channel Estimation for UM-MIMO

Authors: Dmitry Artemasov, Alexander Shmatok, Kirill Andreev, Alexey Frolov, Manjesh K. Hanawal, Nikola Zlatanov · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: sequence-modeling · efficiency

TL;DR: Applies a block-recurrent transformer to hybrid near/far-field THz UM-MIMO channel estimation, training one state-memory transformer block once and iteratively reusing it to improve narrowband and wideband NMSE.
[04/30/2026] ITS-Mina: A Harris Hawks Optimization-Based All-MLP Framework with Iterative Refinement and External Attention for Multivariate Time Series Forecasting

Authors: Pourya Zamanvaziri, Amirhossein Sadr, Aida Pakniyat, Dara Rahmati · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: sequence-modeling · efficiency

TL;DR: Applies a shared-parameter iterative refinement module inside an all-MLP multivariate time-series forecasting system, using the loop-model pattern for a concrete forecasting application.
[04/13/2026] A Deep Equilibrium Network for Hyperspectral Unmixing

Authors: Chentong Wang, Jincheng Gao, Fei Zhu, Jie Chen · 2026

Loop Mechanism: implicit-layer

Focus: architecture · training-algorithm

Domains: hyperspectral-imaging

TL;DR: Recasts hyperspectral unmixing as a deep equilibrium model, replacing the reconstruction-gradient operator with a trainable convolutional update and solving for an implicit fixed point with constant-memory differentiation.
[02/08/2026] Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning

Authors: Yalcin Tur, Jalal Naghiyev, Haoquan Fang, Wei-Chuan Tsai, Jiafei Duan, Dieter Fox, Ranjay Krishna · 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm · inference-algorithm

Domains: robotics-vla

TL;DR: Introduces RD-VLA, a vision-language-action architecture with a weight-tied recurrent action head and adaptive stopping, enabling latent test-time compute scaling for robotics with constant memory footprint.
[02/05/2026] On the Role of Iterative Computation in Reinforcement Learning

Authors: Raj Ghugare, Michał Bortkiewicz, Alicja Ziarko, Benjamin Eysenbach · 2026

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm · training-algorithm

Domains: rl-control

TL;DR: Formalizes compute-bounded RL policies and introduces a minimal variable-compute architecture, showing that extra iterative computation improves performance and longer-horizon generalization across 31 online and offline RL tasks.

Blogs

Long-form technical posts, essays, and deep-dives about loop models. Blogs can carry Loop Mechanism / focus / domain tags but stay in a single flat section rather than the paper taxonomy.

[04/29/2026] Exact Input Writes Improve Stable Looped Language Models

Authors: Benhao Huang · Personal Blog 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: language-modeling · theory

TL;DR: Proposes replacing Parcae's Euler input-write gain with the exact zero-order-hold gain, then reports matched 140M looped-language-model controls where Exact-ZOH lowers validation loss under both short-budget probes and an 11.2B-token paper-style run.
[04/21/2026] Claude Mythos, Looped LLM, and the Depth Scaling Axis

Authors: Rui-Jie Zhu · X Article 2026

Loop Mechanism: flat-loop

Focus: architecture

Domains: language-modeling · reasoning · scaling

TL;DR: Analyzes why Claude Mythos-like gains suggest a depth-scaling axis for looped LLMs and discusses stability, inference efficiency, and iso-FLOPs constraints for scaling loop architectures.
[04/19/2026] Loop-Model FLOPs and Memory in an Ablation Chain

Authors: Benhao Huang · Personal Blog 2026

Loop Mechanism: flat-loop

Focus: training-algorithm

Domains: FLOPs-efficiency · memory-efficiency · theory

TL;DR: Builds a clean cost-ablation chain for loop-model training, comparing shared versus non-shared weights, per-step losses, detach, instant updates, internal truncation, and gradient checkpointing, then checks the resulting FLOPs and memory trade-offs with a toy benchmark.
[04/19/2026] On the Looped Transformers Controversy

Authors: Chris Hayduk · X Article 2026

Loop Mechanism: flat-loop

Focus: architecture

Domains: language-modeling · reasoning · scaling

TL;DR: Argues that benchmark patterns and serving-compute constraints make deterministic weight-tied looping a plausible explanation for Claude Mythos-like gains, while explicitly framing the claim as speculation rather than confirmation.
[01/12/2026] Looped-GPT: Looping During Pre-training improves Generalization

Authors: Sunny Sanyal · Personal Blog 2026

Loop Mechanism: flat-loop

Focus: architecture · training-algorithm

Domains: language-modeling · scaling · efficiency

TL;DR: Introduces Looped-GPT, a reverse-residual depth-recurrent GPT variant, and reports pre-training experiments where looped models improve generalization under matched parameter, token, and fixed-FLOPs settings.
[01/07/2020] Adaptive Computation Time (ACT) in Neural Networks [3/3]

Authors: Grigory Sapunov · Medium 2020

Loop Mechanism: flat-loop

Focus: architecture · inference-algorithm

Domains: language-modeling · algorithmic-reasoning · efficiency

TL;DR: Reviews Adaptive Computation Time in transformer-style models, focusing on Universal Transformers with dynamic per-position halting, adaptive attention span, and ALBERT-style cross-layer parameter sharing as related forms of adaptive or repeated computation.

Contributing

We welcome additions, corrections, and scope challenges.

The preferred PR Submission Guide workflow is:

Open the PR Submission Guide
Reuse the searchable Loop Mechanism (mechanism_tags) / focus_tags / domain_tags, then fill the alias tags manually only if needed
Review the generated path and generated YAML locally
Fork the repo on GitHub to your own account
Create a branch in your fork, create the generated file path, paste the generated YAML, and open a pull request

The guide generates YAML for papers/ or blogs/ directly. For blogs, the filename should follow blogs/YYYY-MM-DD-shortname.yaml. Blogs here should be substantive long-form technical posts, not short announcements or marketing pages.

See CONTRIBUTING.md, TAXONOMY.md, and TAGS.md for details.

_{Maintained by huskydoge.
README auto-generated from papers/*.yaml and blogs/*.yaml by scripts/build.py.}

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github		.github
assets		assets
blogs		blogs
briefings/2026		briefings/2026
papers		papers
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TAGS.md		TAGS.md
TAXONOMY.md		TAXONOMY.md
extended_survey_report.md		extended_survey_report.md
index.html		index.html
papers.json		papers.json
repo_meta.json		repo_meta.json
submit.html		submit.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Loop Models

🌐 Interactive Browser · 🧾 PR Submission Guide

News

What Counts as a Loop Model?

How the Repository Is Organized

Table of Contents

Theoretical and Mechanical Analysis

Architecture and Algorithm Designs

Applications Focused

Blogs

Contributing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Awesome Loop Models

🌐 Interactive Browser · 🧾 PR Submission Guide

News

What Counts as a Loop Model?

How the Repository Is Organized

Table of Contents

Theoretical and Mechanical Analysis

Architecture and Algorithm Designs

Applications Focused

Blogs

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages