Search, filter, and explore loop-model papers and selected technical blogs with links to arXiv, code, OpenReview, and more.
Use the PR Submission Guide to generate YAML for papers or blogs, then copy the path and YAML into your fork / branch for the final pull request step.
A curated list of papers and selected long-form technical blogs on Loop Models — architectures where, within a single forward process, a shared learned internal layer, block, module, or operator is reused.
- 2026-04-24 — Awesome Loop Models is released. Announcement
This repository uses a strict definition:
By "loop model," we mean that, within a single forward pass of a model, a shared learned internal layer, block, module, or operator is reused.
This repo therefore includes papers that focus on loop models themselves, their mechanisms, applications, and designs. It excludes papers that are primarily about broader-scale iteration patterns that do not directly connect to loop models as defined above, such as agent loops, repeated full-model calls, external solver rounds, energy-based models, or plain sequence-time recurrence.
Admittedly, loop models are deeply connected to the broader field of architecture and algorithm design (Diffusion, Energy-Based Models, etc.). We also welcome work that explicitly connects adjacent topics to loop models.
Only the rightmost end of this scale is in scope for the main paper list.
The public browsing layer uses exactly three flat paper categories:
- Theoretical and Mechanical Analysis — analytical papers whose main reader takeaway is understanding: theory, mechanism analysis, probing, diagnostics, or formal properties
- Architecture and Algorithm Designs — papers that propose loop-model architectures or algorithms, often for better performance, efficiency, training, inference, or memory use
- Applications Focused — papers whose main reader takeaway is loop-model performance on concrete external domains or tasks, such as robotics, VLA, multimodal tasks, tabular data, or graph data
In addition, selected long-form technical posts live in a separate flat Blogs section. Blogs can carry tags, but they do not use the paper taxonomy.
The paper categories are intentionally coarse. Foundation status plus Loop Mechanism / focus / domain tags carry secondary structure without introducing a separate lineage-tag axis.
Top-level categories do the minimum amount of work. Finer distinctions are pushed into:
- Loop Mechanism (
mechanism_tags) — loop-form labels only:hierarchical-loop,flat-loop,parallel-loop, orimplicit-layer focus_tags— whether the paper mainly studiesobjective-loss,training-algorithm,architecture,data, orinference-algorithmdomain_tags— problem/domain labels such aslanguage-modeling,robotics-vla,multimodal,tabular-data, orgraph-datatags— optional aliases or model identifiers kept in YAML / README metadata, such asDEQ,UT,ACT, orOuro
A paper can also carry foundation: true as a secondary badge when it is a canonical anchor such as ACT, Universal Transformers, or DEQ. Foundation is no longer a separate top-level shelf.
In the interactive browser, the visible tag filters are Loop Mechanism, focus_tags, and domain_tags. Alias-style tags are not shown as browser filter chips there.
See TAGS.md for the current tag inventory and preferred spellings before proposing a new tag.
See TAXONOMY.md for the full inclusion rule, paper category definitions, tie-break rules, and the flat Blogs-section rule.
- Theoretical and Mechanical Analysis (23)
- Architecture and Algorithm Designs (61)
- Applications Focused (11)
- Blogs (6)
The paper shelves are intentionally coarse: Theoretical and Mechanical Analysis, Architecture and Algorithm Designs, and Applications Focused. Foundation status plus Loop Mechanism / focus / domain tags carry secondary structure without introducing lineage buckets. Blogs are a separate flat section: they can carry tags, but they do not use the paper taxonomy.
Theoretical and Mechanical Analysis collects papers whose primary contribution is analysis: why loop models work, what formal properties they have, and what mechanisms they exhibit.
-
[05/30/2026] Looped Transformers with Layer Normalization Provably Learn the Power Method
Authors: Lyumin Wu, Chenyang Zhang, Yuan Cao · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: theory · algorithmic-reasoningTL;DR: Proves that a looped linear transformer with layer normalization, trained only for principal component prediction, converges to a solution implementing the power method, with each self-attention layer performing one power iteration. -
[05/29/2026] Chain-of-Thought and Compressed Looped Transformers: A Memory-Budget Separation
Authors: Haozhou Zhang · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: language-modeling · reasoning · theory · memory-efficiencyTL;DR: Compares chain-of-thought scratchpads with compressed looped Transformers, arguing that looped hidden-state recurrence is bounded by its persistent memory budget even when more recurrent computation is applied. -
[05/26/2026] Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models
Authors: Xiao-Wen Yang, Ziyu Han, Xi-Hua Zhang, Wen-Da Wei, Jie-Jing Shao, Lan-Zhe Guo, Yu-Feng Li · 2026Loop Mechanism: flat-loopFocus: training-algorithm · inference-algorithmDomains: language-modeling · reasoning · theory · scalingTL;DR: Analyzes why Looped Language Models can collapse at larger recurrence depths and proposes STARS, a spectral-radius-regularized training framework that pushes latent dynamics toward stable fixed points for reliable test-time scaling. -
[05/20/2026] Interaction Locality in Hierarchical Recursive Reasoning
Authors: Yosuke Miyanishi, Tetsuro Morimura · 2026Loop Mechanism: hierarchical-loop · flat-loopFocus: architecture · inference-algorithmDomains: reasoning · algorithmic-reasoningTL;DR: Proposes interaction locality as a mechanistic measurement framework for HRM and TRM, showing how repeated recursive updates accumulate local writes into broader solution structure on grid reasoning benchmarks. -
[05/18/2026] One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer
Authors: Jucheng Shen, Barbara Su, Anastasios Kyrillidis · 2026Loop Mechanism: flat-loop · hierarchical-loopFocus: architecture · inference-algorithmDomains: reasoning · algorithmic-reasoningTL;DR: Analyzes Asymmetric Input Recurrence, a two-state shared-weight recurrent Transformer where the same model updates L/H states, showing that state identity and input-injection asymmetry induce distinct proposal-vs-uncertainty roles on Sudoku-Extreme and Maze. -
[05/08/2026] Bifurcation Models: Learning Set-Valued Solution Maps with Weight-Tied Dynamics
Authors: Caleb Jore, Jialin Liu · 2026Loop Mechanism: flat-loop · implicit-layerFocus: architecture · inference-algorithmDomains: theory · algorithmic-reasoningTL;DR: Studies weight-tied dynamics for set-valued solution maps, proving that regular equilibrium dynamics can represent multiple branches while repeated shared-operator iterations discover multiple valid equilibria on Ising and Allen-Cahn tasks. -
[05/07/2026] Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models
Authors: Amir Rezaei Balef, Mykhailo Koshil, Katharina Eggensperger · ICML 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: tabular-data · reasoningTL;DR: Analyzes layerwise inference dynamics in tabular foundation models and uses the observed depth redundancy to build a looped single-layer model that preserves comparable performance with about 20% of the original parameters. -
[05/07/2026] Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
Authors: Chenyang Zhang, Yuan Cao · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithm · training-algorithmDomains: theory · reasoningTL;DR: Proves that softmax transformers can implement in-context logistic regression by treating layers as normalized-gradient-descent steps, then trains one self-attention layer and applies it recurrently as a looped model with convergence and OOD guarantees. -
[04/28/2026] On Halting vs Converging in Recurrent Graph Neural Networks
Authors: Jeroen Bollen, Stijn Vansummeren · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: theory · algorithmic-reasoningTL;DR: Analyzes recurrent graph neural networks that repeatedly apply message passing until convergence or halting, proving expressiveness relationships between converging, output-converging, and halting RGNN variants. -
[04/23/2026] Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning
Authors: Grigory Sapunov · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: reasoning · algorithmic-reasoningTL;DR: Studies a single-block Universal Transformer with ACT on Sudoku-Extreme, showing that learned memory tokens are necessary for non-trivial recursive-depth reasoning and that ACT initialization can trap the model in shallow computation. -
[04/22/2026] How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models
Authors: Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis · 2026Loop Mechanism: flat-loopFocus: architectureDomains: language-modeling · scaling · efficiencyTL;DR: Measures the parameter value of recurrence in looped language models with iso-depth scaling laws, estimating how extra recurrent passes trade off against unique depth and training compute. -
[04/16/2026] Stability and Generalization in Looped Transformers
Authors: Asher Labovich · 2026Loop Mechanism: flat-loop · implicit-layerFocus: inference-algorithmDomains: reasoning · theoryTL;DR: Analyzes stability and generalization in looped transformers through a fixed-point framework, characterizing when recall and normalization yield reachable, input-dependent, and trainable loop dynamics. -
[04/15/2026] Hierarchical vs. Flat Iteration in Shared-Weight Transformers
Authors: Sang-Il Han · 2026Loop Mechanism: flat-loop · hierarchical-loopFocus: architectureDomains: language-modeling · scalingTL;DR: Empirically compares hierarchical shared-weight recurrence against flat shared-weight iteration and independent-layer stacking, revealing a persistent representational gap for the recurrent hierarchy. -
[04/13/2026] A Mechanistic Analysis of Looped Reasoning Language Models
Authors: Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong · 2026Loop Mechanism: implicit-layerFocus: inference-algorithmDomains: language-modeling · reasoningTL;DR: Analyzes looped reasoning LLMs mechanistically, showing recurrent cycles converge to layer-specific fixed points and that feedforward-like inference stages repeat across latent recurrences. -
[04/10/2026] Relational Preference Encoding in Looped Transformer Internal States
Authors: Jan Kirin · 2026Loop Mechanism: flat-loopFocus: training-algorithm · architectureDomains: language-modeling · alignmentTL;DR: Probes looped transformer hidden states during iterative refinement, showing that human-preference information is encoded primarily in relational differences between loop states rather than independent per-state scores. -
[04/09/2026] Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
Authors: Harsh Kohli, Srinivasan Parthasarathy, Huan Sun, Yuekun Yao · 2026Loop Mechanism: flat-loop · implicit-layerFocus: inference-algorithmDomains: language-modeling · reasoningTL;DR: Studies implicit reasoning in recurrent-depth transformers, showing that iterating shared transformer layers can unlock systematic generalization and depth extrapolation while also exposing overthinking limits. -
[02/05/2026] Inverse Depth Scaling From Most Layers Being Similar
Authors: Yizhou Liu, Sara Kangaslahti, Ziming Liu, Jeff Gore · 2026Loop Mechanism: flat-loopFocus: architectureDomains: language-modeling · theoryCommunity Comments: X CommentTL;DR: Analyzes LLMs and toy residual networks to show loss scales inversely with depth when many layers are functionally similar and primarily reduce error via ensemble averaging. -
[09/27/2025] Two-Scale Latent Dynamics for Recurrent-Depth Transformers
Authors: Francesco Pappone, Donato Crisostomi, Emanuele Rodolà · 2025Loop Mechanism: flat-loopFocus: inference-algorithmDomains: language-modeling · reasoningTL;DR: Analyzes recurrent-depth transformers through a two-scale latent-dynamics lens, showing shrinking and increasingly orthogonal loop updates and deriving a second-order early-exit criterion that improves latency-quality trade-offs. -
[07/02/2025] Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
Authors: Wenquan Lu, Yuechuan Yang, Kyle Lee, Yanshu Li, Enqi Liu · 2025Loop Mechanism: flat-loopFocus: inference-algorithmDomains: language-modeling · reasoningTL;DR: Probes a depth-recurrent Transformer to test whether latent chain-of-thought structure emerges across recurrence steps, finding limited evidence and recurrence-depth-dependent interpretability effects. -
[02/24/2025] Reasoning with Latent Thoughts: On the Power of Looped Transformers
Authors: Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, Sashank J. Reddi · ICLR 2025Loop Mechanism: flat-loopFocus: training-algorithm · inference-algorithmDomains: language-modeling · reasoningCommunity Comments: Reza Bayat reading list (#7)TL;DR: Studies looped transformers as reasoning models, showing effective-depth scaling, latent-thought simulation of chain-of-thought, and a looping-based regularizer that improves the reasoning-versus-memorization trade-off. -
[10/02/2024] On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Authors: Kevin Xu, Issei Sato · 2024Loop Mechanism: flat-loopFocus: architectureDomains: language-modeling · reasoning · theoryTL;DR: Analyzes the expressive power of looped transformers, derives approximation-rate limits, and shows that timestep encoding improves their function-approximation behavior. -
[11/21/2023] Looped Transformers are Better at Learning Learning Algorithms
Authors: Liu Yang, Kangwook Lee, Robert Nowak, Dimitris Papailiopoulos · ICLR 2024Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: algorithmic-reasoningCommunity Comments: Benhao's reading note Reza Bayat reading list (#5)TL;DR: Proposes looped-transformer training for in-context data-fitting tasks, showing comparable performance to standard transformers with under 10% of the parameters by better matching iterative learning algorithms. -
[01/30/2023] Looped Transformers as Programmable Computers
Authors: Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos · 2023Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: algorithmic-reasoningCommunity Comments: Reza Bayat reading list (#4)TL;DR: Shows that a shallow looped transformer can emulate instruction-set computation and iterative algorithms such as SGD or matrix inversion, with the recurrence acting as a reusable program counter.
Architecture and Algorithm Designs collects the constructive side of the field: new looped architectures, algorithms, recurrent computation graphs, and efficiency or memory-compression methods.
-
[06/03/2026] LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling
Authors: Wenkai Chen, Tianshu Li, Wenyong Huang, Yichun Yin, Lifeng Shang, Chengwei Qin · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: language-modeling · efficiency · scalingTL;DR: Introduces LoopMoE, a looped mixture-of-experts language model that combines sparse routing with iterative weight-shared computation through iteration-conditioned modulation and capacity balancing. -
[05/31/2026] CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability
Authors: Chad A. Capps · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: language-modeling · efficiency · scalingTL;DR: Introduces a compact language model that reuses a single shared transformer core across depth while anchoring recurrence to precomputed key-value tensors and reports a mostly negative parameter-parity result against dense baselines. -
[05/29/2026] Fixed-Point Masked Generative Modeling
Authors: Andrea Miele, Yiming Qin, Alba Carballo-Castro, Justin Deschenaux, Pascal Frossard · 2026Loop Mechanism: implicit-layerFocus: architecture · training-algorithm · inference-algorithmDomains: language-modeling · vision · efficiencyTL;DR: Replaces part of a masked generative model denoiser with a fixed-point solver over shared attention layers, using consistency training and solver-state reuse to adapt depth with fewer parameters. -
[05/27/2026] CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models
Authors: Venkat Akhil Lakkapragada · 2026Loop Mechanism: hierarchical-loopFocus: architecture · inference-algorithmDomains: language-modeling · reasoning · efficiencyTL;DR: Explores a compact autoregressive language model with a Hierarchical Reasoning Module that iterates through high-level and low-level reasoning cycles and learns input-dependent halting behavior for adaptive reasoning depth. -
[05/25/2026] Looped Diffusion Language Models
Authors: Sanghyun Lee, Chunsan Hong, Seungryong Kim, Jonghyun Lee, Jongho Park, Dongmin Park · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: language-modeling · reasoning · efficiency · scalingTL;DR: Introduces LoopMDM, selectively looping early-middle transformer layers in masked diffusion language models so training gains depth-scaling without extra parameters and inference can vary loop count for compute scaling. -
[05/22/2026] Training-Free Looped Transformers
Authors: Lizhang Chen, Jonathan Li, Chen Liang, Ni Lao, Qiang Liu · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: language-modeling · reasoning · efficiency · scalingTL;DR: Retrofits frozen pretrained transformers with a training-free inference wrapper that repeatedly applies a contiguous mid-stack layer block as damped refinement sub-steps, improving several QA and reasoning benchmarks without fine-tuning. -
[05/20/2026] Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning
Authors: Benhao Huang, Zhengyang Geng, Zico Kolter · ICML 2026Loop Mechanism: flat-loop · implicit-layerFocus: architecture · inference-algorithmDomains: reasoning · algorithmic-reasoning · scalingTL;DR: Formalizes Equilibrium Reasoners as learned latent dynamical systems whose repeated update rule converges toward task-conditioned attractors, enabling depth and breadth test-time scaling for reasoning. -
[05/20/2026] LT2: Linear-Time Looped Transformers
Authors: Chunyuan Deng, Yizhe Zhang, Rui-Jie Zhu, Yuanyuan Xu, Jiarui Liu, T. S. Eugene Ng, Hanjie Chen · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: language-modeling · reasoning · efficiency · scalingTL;DR: Introduces LT2, a looped-transformer family that replaces quadratic attention with linear or sparse attention so repeated loop steps refine memory and expand receptive field while keeping inference more scalable. -
[05/19/2026] Generative Recursive Reasoning
Authors: Junyeob Baek, Mingyu Jo, Minsu Kim, Mengye Ren, Yoshua Bengio, Sungjin Ahn · 2026Loop Mechanism: flat-loop · parallel-loopFocus: architecture · objective-loss · training-algorithm · inference-algorithmDomains: reasoning · algorithmic-reasoningTL;DR: Introduces GRAM, a probabilistic recursive-reasoning framework that models reasoning as stochastic latent trajectories, enabling multi-hypothesis computation, variational training, and inference-time scaling through depth and parallel sampling. -
[05/19/2026] Probabilistic Tiny Recursive Model
Authors: Amin Sghaier, Ali Parviz, Alexia Jolicoeur-Martineau · 2026Loop Mechanism: hierarchical-loop · flat-loop · parallel-loopFocus: inference-algorithmDomains: reasoning · algorithmic-reasoning · efficiencyTL;DR: Introduces PTRM, an inference-time scaling framework for Tiny Recursive Models that injects Gaussian noise into recursive latent updates, runs parallel trajectories, and selects the final answer with the model's Q head without retraining. -
[05/18/2026] HRM-Text: Efficient Pretraining Beyond Scaling
Authors: Guan Wang, Changling Liu, Chenyu Wang, Cai Zhou, Yuhao Sun, Yifei Wu, Shuai Zhen, Luca Scimeca, Yasin Abbasi Yadkori · Preprint 2026Loop Mechanism: hierarchical-loop · flat-loopFocus: architecture · training-algorithm · objective-loss · dataDomains: language-modeling · reasoning · efficiencyTL;DR: Introduces HRM-Text, a 1B Hierarchical Recurrent Model language model that combines dual-timescale recurrent Transformer modules with MagicNorm, warmup deep credit assignment, PrefixLM masking, and task-completion pretraining for efficient training from 40B unique tokens. -
[05/15/2026] Looped SSMs: Depth-Recurrence and Input Reshaping for Time Series Classification
Authors: Mónika Farsang, Ramin Hasani, Daniela Rus, Radu Grosu · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: sequence-modeling · efficiency · scalingTL;DR: Extends looped-transformer depth recurrence to state-space models by reusing the same SSM block across depth and adding input reshaping, showing tied-depth SSMs match or beat untied SSMs on six time-series benchmarks despite fewer parameters. -
[05/12/2026] Solve the Loop: Attractor Models for Language and Reasoning
Authors: Jacob Fein-Ashley, Paria Rashidinejad · 2026Loop Mechanism: flat-loop · implicit-layerFocus: architecture · training-algorithm · inference-algorithmDomains: language-modeling · reasoning · scaling · efficiencyTL;DR: Introduces Attractor Models, where a backbone proposes output embeddings and an attractor module iteratively solves a fixed point with implicit differentiation, improving looped language modeling and small-model reasoning while allowing adaptive convergence-depth inference. -
[05/11/2026] Simply Stabilizing the Loop via Fully Looped Transformer
Authors: Rao Fu, Zixuan Yang, Jiankun Zhang, Jing Ma, Hechang Chen, Yu Li, Yi Chang · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: language-modeling · reasoning · scaling · efficiencyTL;DR: Stabilizes looped transformers with parameter-free fully looped signal routing and attention injection, enabling stable training at higher loop counts while preserving test-time loop-depth control. -
[05/10/2026] LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models
Authors: Taekhyun Park, Yongjae Lee, Dohee Kim, Hyerim Bae · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: language-modeling · reasoning · efficiency · scalingTL;DR: Converts pretrained LLMs into encoder, looped reasoning block, and decoder components, using selective gating, random deep supervision, and adaptive early exiting to stabilize latent looping without training recurrent models from scratch. -
[05/09/2026] Quantum Injection Pathways for Implicit Graph Neural Networks
Authors: Pengyuan Xu, Tristan Zaborniak, Luis F. Rivera, Hausi A. Müller · 2026Loop Mechanism: implicit-layerFocus: architecture · inference-algorithmDomains: theory · efficiencyTL;DR: Formulates quantum-signal injection pathways for graph deep-equilibrium models, comparing fixed, state-dependent, and backbone-dependent coupling inside the fixed-point operator with contraction guarantees and graph-classification experiments. -
[05/09/2026] Sparse Layers are Critical to Scaling Looped Language Models
Authors: Ryan Lee, Jacob Biloki, Edward J. Hu, Jonathan May · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: language-modeling · scaling · efficiency · MoETL;DR: Shows that MoE-style sparse layers can make looped language models scale better than dense looped transformers, with routing divergence across repeated shared layers recovering expressivity and loop boundaries serving as effective early-exit points. -
[05/08/2026] Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
Authors: Victor Conchello Vendrell, Arnau Padres Masdemont, Niccolò Grillo, Jordi Ros-Giralt, Arash Behboodi, Fabio Valerio Massoli · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: language-modeling · reasoning · efficiency · memory-efficiencyTL;DR: Memory-Efficient Looped Transformer enables constant‑memory iterative reasoning by sharing a single KV cache across loops, achieving strong performance without the linear memory scaling of prior looped LLMs. -
[04/23/2026] Hyperloop Transformers
Authors: Abbas Zeitoun, Lucas Torroba-Hennigen, Yoon Kim · 2026Loop Mechanism: flat-loopFocus: architectureDomains: language-modeling · efficiency · memory-efficiencyCommunity Comments: Turing PostsTL;DR: Introduces Hyperloop Transformers, a parameter-efficient looped Transformer that applies only a middle block recurrently and adds hyper-connections between loops to improve memory-efficient language modeling. -
[04/20/2026] One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models
Authors: Chris Cameron, Wangzheng Wang, Nikita Ivanov, Ashmita Bhattacharyya, Didier Chételat, Yingxue Zhang · 2026Loop Mechanism: flat-loopFocus: training-algorithm · inference-algorithm · architectureDomains: reasoning · algorithmic-reasoningTL;DR: Introduces Denoising Recursion Models, a looped-transformer training method that corrupts targets and trains recursive refinement over multiple steps, improving ARC-AGI reasoning over TRM. -
[04/19/2026] LASER: Low-Rank Activation SVD for Efficient Recursion
Authors: Ege Çakar, Ketan Ali Raghu, Lia Zheng · 2026Loop Mechanism: hierarchical-loopFocus: architecture · inference-algorithmDomains: efficiencyTL;DR: Analyzes Tiny Recursive Model activation geometry during recursive unrolling and introduces LASER, a dynamic low-rank activation compression method that cuts recursive activation memory by ~60% without statistically significant accuracy loss. -
[04/14/2026] 🌟 Parcae: Scaling Laws For Stable Looped Language Models
Authors: Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, Daniel Y. Fu · 2026Loop Mechanism: flat-loopFocus: objective-loss · architectureDomains: language-modeling · reasoningCommunity Comments: Benhao's reading noteTL;DR: Introduces Parcae, a stable looped language model that constrains injection spectral norms to prevent instability and studies isoFLOPs-style training- and test-time scaling laws for quality gains under fixed-parameter budgets. -
[04/10/2026] ELT: Elastic Looped Transformers for Visual Generation
Authors: Sahil Goyal, Swayam Agrawal, Gautham Govind Anil, Prateek Jain, Sujoy Paul, Aditya Kusupati · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: vision · efficiencyCommunity Comments: Tweet by Grigory Sapunov Grigory Sapunov's reading notesTL;DR: Introduces Elastic Looped Transformers for image and video generation, using weight-shared recurrent transformer blocks plus Intra-Loop Self Distillation to support any-time inference with dynamic quality-compute trade-offs from a single training run. -
[03/23/2026] Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization
Authors: Hung-Hsuan Chen · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: reasoning · compositional-reasoningTL;DR: Introduces a depth-recurrent Transformer for compositional generalization, with silent thinking, LayerScale, and identity-biased recurrence enabling stable deep latent iteration. -
[03/20/2026] LoopRPT: Reinforcement Pre-Training for Looped Language Models
Authors: Guo Tang, Shixin Jiang, Heng Chang, Nuo Chen, Yuhan Li, Huiming Fan, Jia Li, Ming Liu, Bing Qin · 2026Loop Mechanism: flat-loopFocus: objective-loss · training-algorithmDomains: language-modeling · reasoning · RLTL;DR: Proposes LoopRPT, a reinforcement pre-training method for looped language models that assigns learning signals to latent iterations, improving accuracy-compute trade-offs and strengthening early-stage reasoning on Ouro. -
[03/09/2026] Adaptive Loops and Memory in Transformers: Think Harder or Know More?
Authors: Markus Frey, Behzad Shomali, Ali Hamza Bashir, David Berghaus, Joachim Koehler, Mehdi Ali · 2026Loop Mechanism: flat-loopFocus: architectureDomains: language-modeling · reasoning · efficiencyTL;DR: Introduces transformers with adaptive per-layer looping and gated memory banks, showing that combining learned halting with extra storage improves reasoning under matched parameter and FLOP budgets. -
[03/09/2026] Tiny Autoregressive Recursive Models
Authors: Paulius Rauba, Claudio Fanconi, Mihaela van der Schaar · 2026Loop Mechanism: hierarchical-loop · flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: algorithmic-reasoning · language-modelingCommunity Comments: Benhao's reading noteTL;DR: Studies autoregressive Tiny Recursive Models under compute-matched baselines, finding that simple two-step refinement helps on small algorithmic tasks while the full Autoregressive TRM shows no reliable gains. -
[03/05/2026] Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation
Authors: Yilong Chen, Naibin Gu, Junyuan Shang, Zhenyu Zhang, Yuchen Feng, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang · 2026Loop Mechanism: flat-loopFocus: objective-loss · architecture · inference-algorithmDomains: language-modeling · efficiency · MoECommunity Comments: Benhao's reading noteTL;DR: Proposes MOUE, which reuses a universal layer-agnostic expert pool across layers to transform depth into virtual width and improve MoE performance under fixed activation budgets. -
[03/05/2026] Recursive Inference Machines for Neural Reasoning
Authors: Mieszko Komisarczyk, Saurabh Mathur, Maurice Kraus, Sriraam Natarajan, Kristian Kersting · 2026Loop Mechanism: hierarchical-loopFocus: architecture · inference-algorithmDomains: reasoning · RLCommunity Comments: Benhao's reading noteTL;DR: Introduces Recursive Inference Machines, a recurrent reasoning framework that casts TRMs as a special case and improves ARC-AGI, Sudoku, and tabular classification by reweighting the history of loop states. -
[03/02/2026] AdaPonderLM: Gated Pondering Language Models with Token-Wise Adaptive Depth
Authors: Shixiang Song, He Li, Zitong Wang, Boyi Zeng, Feichen Song, Yixuan Wang, Zhiqin John Xu, Ziwei He, Zhouhan Lin · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: language-modeling · reasoning · efficiencyTL;DR: Introduces AdaPonderLM, a self-supervised recurrent language model with token-wise halting gates and KV reuse, allocating more loop steps to hard tokens under a fixed compute budget. -
[02/12/2026] SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion
Authors: Chengting Yu, Xiaobo Shu, Yadao Wang, Yizhen Zhang, Haoyi Wu, You Wu, Rujiao Long, Ziheng Chen, Yuchi Xu, Wenbo Su, Bo Zheng · 2026Loop Mechanism: hierarchical-loopFocus: architecture · inference-algorithmDomains: language-modeling · reasoningTL;DR: Introduces SpiralFormer, a looped transformer that applies shared layers under a multi-resolution recursion schedule to learn hierarchical dependencies more efficiently than fixed-resolution recurrent baselines. -
[02/11/2026] LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation
Authors: Ahmadreza Jeddi, Marco Ciccone, Babak Taati · ICLR 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: language-modeling · reasoning · efficiencyTL;DR: Introduces LoopFormer, trained on variable-length trajectories to enable budget-conditioned reasoning. Uses shortcut-consistency regularization to ensure stable internal trajectories across different loop depths. -
[02/11/2026] Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models
Authors: Jonathan Williams, Esin Tureci · 2026Loop Mechanism: flat-loopFocus: objective-loss · training-algorithmDomains: language-modeling · reasoningTL;DR: Introduces RLTT, a reinforcement-learning objective that assigns reward across the full latent thought trajectory of looped language models rather than only the final latent state. -
[02/09/2026] Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models
Authors: Ruihan Xu, Yuting Gao, Lan Wang, Jianing Li, Weihao Chen, Qingpei Guo, Ming Yang, Shiliang Zhang · 2026Loop Mechanism: hierarchical-loopFocus: architecture · inference-algorithmDomains: vision · efficiencyTL;DR: Introduces RecursiveVLM, a recursive multimodal transformer with a recursive connector and monotonic recursion loss that enables on-demand extra refinement under varying compute budgets. -
[02/09/2026] Understanding Dynamic Compute Allocation in Recurrent Transformers
Authors: Ibraheem Muhammad Moosa, Suhas Lohit, Ye Wang, Moitreya Chatterjee, Wenpeng Yin · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: language-modeling · algorithmic-reasoning · efficiencyCommunity Comments: Benhao's reading noteTL;DR: Proposes ANIRA, a recurrent Transformer framework for per-token variable-depth computation, and shows adaptive compute can align with token complexity while failing to extrapolate to longer algorithmic inputs. -
[01/29/2026] Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves
Authors: Jonas Knupp, Jan Hendrik Metzen, Jeremias Bohn, Georg Groh, Kristian Kersting · 2026Loop Mechanism: parallel-loop · flat-loopFocus: architecture · inference-algorithmDomains: language-modeling · reasoning · efficiencyCommunity Comments: Benhao's reading noteTL;DR: Introduces a modular framework combining sequence attention and depth attention for recurrent-depth models, improving FLOP-, parameter-, and memory-efficiency simultaneously. -
[01/26/2026] ChainGPT: Dual-Reasoning Model with Recurrent Depth and Multi-Rank State Updates
Authors: Yunao Zheng, Xiaojie Wang, Lei Ren, Chen Wei · ICLR 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: language-modeling · reasoningTL;DR: Introduces ChainGPT, a dual-reasoning recurrent-depth architecture that combines multi-substep state updates and state-guided sparse attention to move reasoning into latent computation, with adaptive stopping as a supporting mechanism. -
[01/26/2026] MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning
Authors: Xiaojing Zhang, Haifeng Wu, Gang He, Jiyang Shen, Bochen Lyu, Zhanxing Zhu · ICLR 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithm · training-algorithmDomains: language-modeling · reasoning · efficiency · MoETL;DR: Introduces MoDr, which adds multi-branch routing to a depth-recurrent Transformer so looped models can explore solution paths more adaptively at test time. -
[12/16/2025] Universal Reasoning Model
Authors: Zitian Gao, Lynx Chen, Yihao Xiao, He Xing, Ran Tao, Haoming Luo, Joey Zhou, Bryan Dai · 2025Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: algorithmic-reasoning · reasoningTL;DR: Proposes URM, a Universal Transformer-based architecture with weight tying that beats standard transformers on reasoning benchmarks through iterative depth computation. -
[11/11/2025] Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Authors: Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang · 2025Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: language-modeling · reasoning · efficiencyTL;DR: Introduces Think-at-Hard, a dynamic latent-thinking method that uses a learned decider to apply extra recurrent latent iterations only to hard tokens, with LoRA refiners and duo-causal attention across iteration depth. -
[11/10/2025] Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Authors: Sean McLeish, Ang Li, John Kirchenbauer, Dayal Singh Kalra, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Jonas Geiping, Tom Goldstein, Micah Goldblum · 2025Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: efficiency · language-modeling · reasoningTL;DR: A framework for retrofitting pretrained feedforward language models with depth recurrence, improving training efficiency for depth-recurrent models and enabling greater FLOP efficiency than comparable feedforward models. -
[10/29/2025] 🌟 Scaling Latent Reasoning via Looped Language Models
Authors: Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, Lu Li, Jiajun Shi, Kaijing Ma, Shanda Li, Taylor Kergan, Andrew Smith, Xingwei Qu, Mude Hui, Bohong Wu, Qiyang Min, Hongzhi Huang, Xun Zhou, Wei Ye, Jiaheng Liu, Jian Yang, Yunfeng Shi, Chenghua Lin, Enduo Zhao, Tianle Cai, Ge Zhang, Wenhao Huang, Yoshua Bengio, Jason Eshraghian · 2025Loop Mechanism: flat-loopFocus: objective-loss · architecture · data · training-algorithm · inference-algorithmDomains: language-modeling · reasoningCommunity Comments: Reza Bayat reading list (#10)TL;DR: Introduces Ouro, a family of pre-trained Looped Language Models (1.4B and 2.6B) that match the performance of 12B standard LLMs. Establishes loop depth as a third scaling axis beyond model size and data. -
[10/28/2025] Parallel Loop Transformer for Efficient Test-Time Computation Scaling
Authors: Bohong Wu, Mengzhao Chen, Xiang Luo, Shen Yan, Qifan Yu, Fan Xia, Tianqi Zhang, Hongrui Zhan, Zheng Zhong, Xun Zhou, Siyuan Qiao, Xingyan Bin · 2025Loop Mechanism: parallel-loop · flat-loopFocus: inference-algorithmDomains: language-modeling · reasoning · efficiencyTL;DR: Introduces the Parallel Loop Transformer, which preserves looped-model accuracy while reducing latency and memory through cross-loop parallelism and shared-loop KV representations. -
[10/06/2025] Less is More: Recursive Reasoning with Tiny Networks
Authors: Alexia Jolicoeur-Martineau · 2025Loop Mechanism: hierarchical-loop · flat-loopFocus: architecture · inference-algorithm · training-algorithmDomains: reasoningTL;DR: Proposes Tiny Recursive Model (TRM), a single tiny network that recursively refines latent state and answer over multiple improvement steps, outperforming HRM and many larger models on ARC-AGI-style reasoning tasks. -
[10/03/2025] Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
Authors: Cai Zhou, Chenxiao Yang, Yi Hu, Chenyu Wang, Chubin Zhang, Muhan Zhang, Lester Mackey, Tommi Jaakkola, Stephen Bates, Dinghuai Zhang · 2025Loop Mechanism: flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: language-modeling · reasoningTL;DR: Proposes Coevolutionary Continuous Discrete Diffusion, a joint continuous-discrete diffusion language model that repeatedly denoises latent and token states with one time-conditioned model, linking diffusion sampling to latent reasoning and looped-transformer expressivity. -
[07/14/2025] Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Authors: Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, Se-Young Yun · 2025Loop Mechanism: hierarchical-loopFocus: architecture · inference-algorithm · training-algorithmDomains: language-modeling · reasoning · efficiencyCommunity Comments: Reza Bayat reading list (#12)TL;DR: Introduces Mixture-of-Recursions, a recursive transformer with token-level routing that adapts recursion depth and active-token attention so easy tokens exit early while hard tokens keep thinking. -
[07/10/2025] Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Authors: Ziyue Li, Yang Li, Tianyi Zhou · 2025Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: language-modeling · reasoningCommunity Comments: X CommentTL;DR: Proposes Chain-of-Layers (CoLa), an inference-time search method that skips or repeats pretrained LLM layers per sample via MCTS to improve efficiency and reasoning accuracy. -
[06/26/2025] Hierarchical Reasoning Model
Authors: Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori · 2025Loop Mechanism: hierarchical-loop · flat-loopFocus: architecture · training-algorithmDomains: reasoning · algorithmic-reasoningTL;DR: Proposes HRM, a brain-inspired recurrent architecture with two coupled modules at different timescales: a high-level module for abstract planning and a low-level module for detailed execution. -
[02/10/2025] Implicit Language Models are RNNs: Balancing Parallelization and Expressivity
Authors: Mark Schöne, Babak Rahmani, Heiner Kremer, Fabian Falck, Hitesh Ballani, Jannes Gladrow · ICML 2025Loop Mechanism: implicit-layerFocus: architecture · inference-algorithmDomains: language-modeling · reasoningTL;DR: Introduces implicit state-space language models that iterate a shared transition toward a fixed point, recovering RNN-like expressivity while retaining mostly parallel training. -
[02/07/2025] 🌟 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Authors: Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein · NeurIPS 2025Loop Mechanism: flat-loopFocus: architectureDomains: language-modeling · reasoningCommunity Comments: Reza Bayat reading list (#9)TL;DR: Presents Huginn, a recurrent-depth transformer (3.5B params) that iterates a single block up to 64 times per token, achieving strong reasoning performance that scales with additional test-time compute. -
[10/28/2024] Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Authors: Sangmin Bae, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Seungyeon Kim, Tal Schuster · 2025Loop Mechanism: hierarchical-loopFocus: architecture · training-algorithmDomains: language-modelingCommunity Comments: Reza Bayat reading list (#11)TL;DR: Presents Relaxed Recursive Transformers as a parameter-sharing conversion and uptraining recipe that turns pretrained LLMs into compact recursive models using layer tying and layer-wise LoRA while preserving performance and improving deployment efficiency. -
[05/25/2024] MoEUT: Mixture-of-Experts Universal Transformers
Authors: Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber, Christopher Potts, Christopher D. Manning · 2024Loop Mechanism: flat-loop · hierarchical-loopFocus: architecture · training-algorithmDomains: language-modeling · reasoning · efficiency · MoETL;DR: Introduces MoEUT, a mixture-of-experts Universal Transformer that combines shared recurrent depth with expert routing to improve language modeling while using less compute and memory than comparable baselines. -
[02/21/2024] AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures
Authors: Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael K. Ng, Zhenguo Li, Zhaoqiang Liu · 2024Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: language-modeling · algorithmic-reasoningTL;DR: Splits computation into pre-, loop-, and post-transformer stages, showing that structured recurrent depth can outperform standard and vanilla looped transformers on algorithmic and language tasks. -
[10/16/2023] CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
Authors: Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi · 2023Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: language-modeling · reasoningTL;DR: Recasts chain-of-thought as recurrent depth inside a token-level transformer, using token-wise adaptive computation to spend extra iterations only where additional reasoning budget helps. -
[09/22/2022] A Generalist Neural Algorithmic Learner
Authors: Borja Ibarz, Vitaly Kurin, George Papamakarios, Kyriacos Nikiforou, Mehdi Bennani, Róbert Csordás, Andrew Dudzik, Matko Bošnjak, Alex Vitvitskyi, Yulia Rubanova, Andreea Deac, Beatrice Bevilacqua, Yaroslav Ganin, Charles Blundell, Petar Veličković · LoG 2022Loop Mechanism: flat-loopFocus: architecture · dataDomains: algorithmic-reasoningTL;DR: Presents a single GNN model trained on 30+ algorithms from the CLRS benchmark, demonstrating that a shared recurrent architecture can generalize across diverse algorithmic tasks. -
[11/09/2021] On Training Implicit Models
Authors: Zhengyang Geng, Xin-Yu Zhang, Shaojie Bai, Yisen Wang, Zhouchen Lin · NeurIPS 2021Loop Mechanism: implicit-layerFocus: training-algorithmDomains: efficiencyTL;DR: Proposes phantom gradient, a lightweight backpropagation estimator for implicit (infinite-depth) models that uses damped unrolling and a truncated Neumann series to speed backward passes while matching or surpassing exact-gradient baselines on large-scale tasks. -
[06/15/2020] Multiscale Deep Equilibrium Models
Authors: Shaojie Bai, Vladlen Koltun, J. Zico Kolter · NeurIPS 2020Loop Mechanism: implicit-layer · hierarchical-loopFocus: architectureDomains: visionTL;DR: Extends DEQ to multiscale hierarchical representations, achieving competitive performance on large-scale vision tasks. -
[09/03/2019] Deep Equilibrium Models
Authors: Shaojie Bai, J. Zico Kolter, Vladlen Koltun · NeurIPS 2019Loop Mechanism: implicit-layerFocus: architecture · training-algorithm · inference-algorithmTL;DR: Proposes to directly solve for the fixed point of an infinite-depth network, enabling implicit-depth models that are memory-efficient and theoretically equivalent to infinitely deep recurrent networks. -
[07/10/2018] Universal Transformers
Authors: Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser · ICLR 2019Loop Mechanism: flat-loopFocus: architectureDomains: language-modeling · algorithmic-reasoningCommunity Comments: Reza Bayat reading list (#3)TL;DR: Extends the standard Transformer with recurrent computation over depth via weight tying, enabling Turing-complete computation and combining the parallelism of Transformers with the inductive bias of RNNs. -
[03/29/2016] Adaptive Computation Time for Recurrent Neural Networks
Authors: Alex Graves · 2016Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: sequence-modeling · efficiencyTL;DR: Introduces ACT, allowing RNNs to learn how many computational steps to take per input, laying the groundwork for dynamic-depth recurrent computation. -
[11/25/2015] Neural GPUs Learn Algorithms
Authors: Łukasz Kaiser, Ilya Sutskever · ICLR 2016Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: algorithmic-reasoningTL;DR: Introduces Neural GPUs, a recurrent convolutional architecture that learns parallel algorithms like addition and multiplication through repeated application of a shared convolutional recurrent block.
Applications Focused collects papers centered on applying loop models to concrete domains or tasks, including robotics, VLA, multimodal settings, tabular data, graph data, and other non-core benchmarks.
-
[06/03/2026] Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers
Authors: Yacouba Kaloga, Shashi Kumar, Shakeel A. Sheikh, Driss Khalil, Petr Motlicek, Ina Kodrasi · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: speech-recognition · efficiency · scalingTL;DR: Introduces LARM, a depth-conditioned looped Transformer for automatic speech recognition that reuses a shared acoustic-encoder block recurrently and scales recognition quality by increasing inference-time loop count. -
[05/28/2026] Déjà View: Looping Transformers for Multi-View 3D Reconstruction
Authors: Alessandro Burzio, Tobias Fischer, Sven Elflein, Qunjie Zhou, Riccardo de Lutio, Jiawei Ren, Jiahui Huang, Shengyu Huang, Marc Pollefeys, Laura Leal-Taixé, Zan Gojcic, Haithem Turki · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: vision · efficiencyTL;DR: Applies a single looped transformer block recurrently to per-view features for a variable number of refinement steps in multi-view 3D reconstruction, exposing loop count as an inference-time compute knob. -
[05/27/2026] Recursive Vision Transformer with Dynamic Depth and Width Adjustment for Resource-Efficient Image Semantic Communication
Authors: Zhilong Zhang, Xinhui Zhang, Gongyu Jin, Sihua Wang, Danpu Liu, Changchuan Yin · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: vision · efficiencyTL;DR: Uses a recursive ViT structure to iteratively refine semantic features for image semantic communication while dynamically adjusting recursive depth and width under image and channel conditions. -
[05/19/2026] i-DEQ: A stable inertial deep equilibrium model for image restoration
Authors: Antonin Clerc, Marien Renaud, Baudouin Denis De Seneville, Nicolas Papadakis · 2026Loop Mechanism: implicit-layerFocus: architecture · inference-algorithm · training-algorithmDomains: vision · efficiencyTL;DR: Introduces i-DEQ, an inertial deep-equilibrium image-restoration model that learns explicit nonconvex regularization and uses momentum in fixed-point iterations, improving stability and robustness while roughly halving DEQ inference time. -
[05/19/2026] Nonlocal operator learning for fMRI encoding and decoding tasks
Authors: Andreas Kramer, Saugat Acharya, Alice Giola, Emanuele Zappala · 2026Loop Mechanism: implicit-layerFocus: architecture · inference-algorithmDomains: sequence-modelingTL;DR: Applies a latent neural integral-operator model to fMRI encoding and decoding, using fixed-point iterations in an auxiliary latent space before downstream classification or stimulus prediction. -
[05/18/2026] PERL: Parameter Efficient Reasoning in CLIP Latent Space
Authors: Simone Carnemolla, Salvatore Calcagno, Daniela Giordano, Concetto Spampinato, Matteo Pennisi · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: vision · reasoning · efficiencyTL;DR: Introduces PERL, a few-shot CLIP adaptation framework that reuses a compact shared reasoning module across latent refinement steps, improving base-to-novel, transfer, and OOD results with about 6K trainable parameters. -
[05/12/2026] Recurrent Transformer-Based Near- and Far-Field THz Wideband Channel Estimation for UM-MIMO
Authors: Dmitry Artemasov, Alexander Shmatok, Kirill Andreev, Alexey Frolov, Manjesh K. Hanawal, Nikola Zlatanov · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: sequence-modeling · efficiencyTL;DR: Applies a block-recurrent transformer to hybrid near/far-field THz UM-MIMO channel estimation, training one state-memory transformer block once and iteratively reusing it to improve narrowband and wideband NMSE. -
[04/30/2026] ITS-Mina: A Harris Hawks Optimization-Based All-MLP Framework with Iterative Refinement and External Attention for Multivariate Time Series Forecasting
Authors: Pourya Zamanvaziri, Amirhossein Sadr, Aida Pakniyat, Dara Rahmati · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: sequence-modeling · efficiencyTL;DR: Applies a shared-parameter iterative refinement module inside an all-MLP multivariate time-series forecasting system, using the loop-model pattern for a concrete forecasting application. -
[04/13/2026] A Deep Equilibrium Network for Hyperspectral Unmixing
Authors: Chentong Wang, Jincheng Gao, Fei Zhu, Jie Chen · 2026Loop Mechanism: implicit-layerFocus: architecture · training-algorithmDomains: hyperspectral-imagingTL;DR: Recasts hyperspectral unmixing as a deep equilibrium model, replacing the reconstruction-gradient operator with a trainable convolutional update and solving for an implicit fixed point with constant-memory differentiation. -
[02/08/2026] Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning
Authors: Yalcin Tur, Jalal Naghiyev, Haoquan Fang, Wei-Chuan Tsai, Jiafei Duan, Dieter Fox, Ranjay Krishna · 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithm · inference-algorithmDomains: robotics-vlaTL;DR: Introduces RD-VLA, a vision-language-action architecture with a weight-tied recurrent action head and adaptive stopping, enabling latent test-time compute scaling for robotics with constant memory footprint. -
[02/05/2026] On the Role of Iterative Computation in Reinforcement Learning
Authors: Raj Ghugare, Michał Bortkiewicz, Alicja Ziarko, Benjamin Eysenbach · 2026Loop Mechanism: flat-loopFocus: architecture · inference-algorithm · training-algorithmDomains: rl-controlTL;DR: Formalizes compute-bounded RL policies and introduces a minimal variable-compute architecture, showing that extra iterative computation improves performance and longer-horizon generalization across 31 online and offline RL tasks.
Long-form technical posts, essays, and deep-dives about loop models. Blogs can carry Loop Mechanism / focus / domain tags but stay in a single flat section rather than the paper taxonomy.
-
[04/29/2026] Exact Input Writes Improve Stable Looped Language Models
Authors: Benhao Huang · Personal Blog 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: language-modeling · theoryTL;DR: Proposes replacing Parcae's Euler input-write gain with the exact zero-order-hold gain, then reports matched 140M looped-language-model controls where Exact-ZOH lowers validation loss under both short-budget probes and an 11.2B-token paper-style run. -
[04/21/2026] Claude Mythos, Looped LLM, and the Depth Scaling Axis
Authors: Rui-Jie Zhu · X Article 2026Loop Mechanism: flat-loopFocus: architectureDomains: language-modeling · reasoning · scalingTL;DR: Analyzes why Claude Mythos-like gains suggest a depth-scaling axis for looped LLMs and discusses stability, inference efficiency, and iso-FLOPs constraints for scaling loop architectures. -
[04/19/2026] Loop-Model FLOPs and Memory in an Ablation Chain
Authors: Benhao Huang · Personal Blog 2026Loop Mechanism: flat-loopFocus: training-algorithmDomains: FLOPs-efficiency · memory-efficiency · theoryTL;DR: Builds a clean cost-ablation chain for loop-model training, comparing shared versus non-shared weights, per-step losses, detach, instant updates, internal truncation, and gradient checkpointing, then checks the resulting FLOPs and memory trade-offs with a toy benchmark. -
[04/19/2026] On the Looped Transformers Controversy
Authors: Chris Hayduk · X Article 2026Loop Mechanism: flat-loopFocus: architectureDomains: language-modeling · reasoning · scalingTL;DR: Argues that benchmark patterns and serving-compute constraints make deterministic weight-tied looping a plausible explanation for Claude Mythos-like gains, while explicitly framing the claim as speculation rather than confirmation. -
[01/12/2026] Looped-GPT: Looping During Pre-training improves Generalization
Authors: Sunny Sanyal · Personal Blog 2026Loop Mechanism: flat-loopFocus: architecture · training-algorithmDomains: language-modeling · scaling · efficiencyTL;DR: Introduces Looped-GPT, a reverse-residual depth-recurrent GPT variant, and reports pre-training experiments where looped models improve generalization under matched parameter, token, and fixed-FLOPs settings. -
[01/07/2020] Adaptive Computation Time (ACT) in Neural Networks [3/3]
Authors: Grigory Sapunov · Medium 2020Loop Mechanism: flat-loopFocus: architecture · inference-algorithmDomains: language-modeling · algorithmic-reasoning · efficiencyTL;DR: Reviews Adaptive Computation Time in transformer-style models, focusing on Universal Transformers with dynamic per-position halting, adaptive attention span, and ALBERT-style cross-layer parameter sharing as related forms of adaptive or repeated computation.
We welcome additions, corrections, and scope challenges.
The preferred PR Submission Guide workflow is:
- Open the PR Submission Guide
- Reuse the searchable Loop Mechanism (
mechanism_tags) /focus_tags/domain_tags, then fill the alias tags manually only if needed - Review the generated path and generated YAML locally
- Fork the repo on GitHub to your own account
- Create a branch in your fork, create the generated file path, paste the generated YAML, and open a pull request
The guide generates YAML for papers/ or blogs/ directly. For blogs, the filename should follow blogs/YYYY-MM-DD-shortname.yaml. Blogs here should be substantive long-form technical posts, not short announcements or marketing pages.
See CONTRIBUTING.md, TAXONOMY.md, and TAGS.md for details.
papers/*.yaml and blogs/*.yaml by scripts/build.py.
