Breaking the Chains of Computational Limits
In the shadowy corridors of AI development, researchers at Mila have unveiled a technique that could revolutionize how large language models (LLMs) handle complex reasoning. Dubbed ‘Markovian Thinking,’ this approach promises to shatter the chains of computational constraints that have long bound AI reasoning capabilities. By structuring reasoning processes into fixed-size chunks, this method dramatically reduces the computational burden traditionally associated with lengthy reasoning tasks. Early estimates suggest a staggering reduction in training costs by more than two-thirds for a 1.5 billion parameter model, compared to conventional methods.
The key innovation lies in Delethink, an environment that reimagines the reasoning chain into manageable segments. This shift addresses the notorious ‘quadratic curse’ that plagues modern transformer-based models, where computational costs soar as reasoning chains lengthen. By avoiding the exponential growth in computational demand, Delethink empowers AI models to engage in extended reasoning without succumbing to prohibitive costs. This breakthrough could herald a new era of AI capabilities, enabling models to tackle complex tasks previously deemed infeasible due to resource constraints.
Rethinking AI’s Approach to Problem Solving
Traditional approaches to AI reasoning have been shackled by the need to generate long series of intermediate ‘thinking’ tokens, often referred to as chain-of-thought (CoT). While reinforcement learning (RL) has enhanced these capabilities, the quadratic growth of computational costs remains a critical flaw. The state of the AI, encompassing the prompt and all reasoning tokens generated, expands with each new token, leading to an unsustainable increase in computational demand.
Mila’s researchers have taken a radical departure from this paradigm. By developing an RL environment that sidesteps the quadratic problem, they have laid the groundwork for AI models capable of multi-week reasoning and scientific discovery. As co-author Amirhossein Kazemnejad notes, the current LongCoT paradigm is ill-equipped to support such capabilities due to its inherent computational inefficiencies. In contrast, Markovian Thinking separates the duration of reasoning from the amount of context processed, transforming the quadratic growth problem into a manageable linear computation.
Delethink: A New Paradigm in AI Reasoning
Delethink, the brainchild of Mila’s researchers, enforces a novel reasoning paradigm where AI models operate within fixed-size chunks, such as 8,000 tokens at a time. This method allows the model to reason using the classic attention mechanism within each chunk. Once the chunk limit is reached, the environment resets the context, incorporating a ‘carryover’ from the previous chunk into a new prompt. This carryover might include the last few tokens or a summary of critical results, ensuring continuity in reasoning.
This innovative approach forces the model to learn how to embed its progress into a ‘textual Markovian state,’ enabling it to retain essential information across reasoning steps. Kazemnejad emphasizes that the original input prompt remains unaltered, focusing the method on the reasoning phase without modifying the initial data. This strategic separation allows the model to maintain a constant reasoning context window, effectively addressing concerns about memory retention and continuity.
Implications and Future Horizons
The implications of Markovian Thinking extend far beyond theoretical advancements. In practical applications, models trained with Delethink demonstrate remarkable efficiency, using linear compute and constant memory during inference. This efficiency translates directly into cost savings for enterprises, as AI agents can engage in prolonged reasoning tasks, such as debugging large codebases, with significantly reduced expenses compared to traditional methods.
Interestingly, even off-the-shelf reasoning models exhibit some innate ability to think in a Markovian manner, suggesting immediate practical benefits for developers. This latent capability, combined with robust performance across complex tasks, positions Delethink as a scalable solution compatible with state-of-the-art models. As the researchers conclude, Markovian Thinking not only opens the door to next-generation AI capabilities but also paves the way for models that can ‘think’ across vast horizons, a crucial step toward scientific discovery and beyond.
Meta Facts
- •💡 Markovian Thinking reduces computational costs by over two-thirds for 1.5B parameter models.
- •💡 Delethink-trained models can reason up to 24,000 tokens, surpassing LongCoT models.
- •💡 Models trained with Delethink use linear compute and constant memory during inference.
- •💡 Delethink forces models to embed progress into a ‘textual Markovian state’.
- •💡 Even off-the-shelf models exhibit some Markovian reasoning capabilities.

