Back in 2011 when I first started writing SEO blog posts for Moz, despite their popularity I was writing walls of text because that was my nature. Then-CMO Jamie Steven instructed me to read Cyrus Shepard’s 10 Super Easy SEO Copywriting Tips for Improved Link Building for direction on how I should structure what I write for better performance.
In the article, Cyrus comes out swinging showing this visual comparison of a wall of text versus a very well-structured piece of content with lots of formatting and imagery. Using data to drive the point home, he shows how the two posts (by the same great internet marketer) had dramatically different performance, with 62X the external link capture and nearly 4X the time on page.
I was hooked and those insights have stuck with me ever since. In fact, you can trace back elements of anything I’ve written over the last 14 years to the formatting lessons of that classic post. I’d go as far as to say I think more about these principles than I do so-called SEO “best practices.”
Part of why what Cyrus outlined resonated with me so much is that the principles just make sense. Conceptually, it all harkens back to everything we all learned about how humans interact with information when we read “Don’t Make Me Think.” Over time, I’ve seen the specificity and better content UX highlighted yield better performance on any human-driven metric we measure as well as more visibility search engines and large language models.
But…Google Says Don’t Break Your Content Into Bite-Sized Chunks
Recently, on the Search Off the Radar podcast, Danny Sullivan shared his opinion on “chunking” as a tactic to drive visibility in AI Search surfaces (emphasis mine).
“One of the things I keep seeing over and over in some of the advice and guidance and people are trying to figure out what do we do with the LLMs or whatever, is that turn your content into bite-sized chunks, because LLMs like things that are really bite size, right?
So we don't want you to do that. I was talking to some engineers about that. We don't want you to do that. We really don't. We don't want people to have to be crafting anything for Search specifically. That's never been where we've been at and we still continue to be that way. We really don't want you to think you need to be doing that or produce two versions of your content, one for the LLM and one for the net.”
After I laughed to myself in the graveyard of AMP POVs and technical specifications, I turned it back on.
He continued, pre-empting the “but it works, I’m going to do it anyway” argument Danny offers (emphasis still mine):
“Let's assume that, in some edge cases, let's even assume maybe in more than some edge cases, you're finding you're getting some advantage here. Maybe tiny degree measure. No, this is my secret weapon. It's doing it." Great. That's what's happening now. But tomorrow the systems may change.
So you've gone through all this effort. You've made all these things that you did specifically for a ranking system, not for a human being, because you were trying to be more successful in the ranking system, not staying focused on the human being. And then the systems improve, probably the way the systems always try to improve, to reward content written for humans. All that stuff that you did to please this LLM system that may or may not have worked, may not carry through for the long term.
So was that the best use of your time and your energy? Was that the best use of putting turmoil into your marketing department, your content department, and all your other stuff so that you could say, "A-ha, I've got the new thing that you wanted, I've brought it down from the mountain and here it is. Do these sorts of things.”
Before I take this where you know I will, let me first say this.
I deeply respect Danny Sullivan for what he has done for the search marketing community both inside and outside of Google. Full stop.
However, I have two problems with these statements and want to clarify for anyone who is questioning the value of improving content structure (partially) in the service of better visibility:
- Chunking and creating content for users are not mutually exclusive.
- The statements misalign with how Retrieval Augmented Generation technology functions and with where the future technologies are going.
In the spirit of chunking, let’s add a heading and get to my next series of extractable atomic points.
Chunking and Writing for Users is Not Mutually Exclusive
First, we need to disambiguate “chunking” from how it’s used to describe an operation in Retrieval Augmented Generation (RAG) systems from how it’s being used to describe a content optimization action.
Chunking as it has been co-opted is really structuring content in a way that its passages and statements perform better when retrieved in a RAG pipeline. I know this because I’m one of the first people to drag the term from the AI/IR space into the SEO space.
If we’re being reductive (like most “it’s just SEO” arguments are) we’re effectively talking about content design or UX writing. As with everything in search and content marketing, machines are just a subset of the target personas. So, the idea of preparing the content only for the machine solely is still nonsense.
When we’re talking about chunking in this sense, the act of structuring content overlaps with the content design aspects of Cyrus’s post. However, where it differs is in reasoning of the decisions that you make in the copy that you write.
While the act overlaps with the largely qualitative processes people have historically used in the past, it is not the same. Effective chunking follows all the practices Cyrus discussed, but combines vector analysis to verify improvements. Also, for clarity, chunking is but one of an array of tactics you should apply from the content engineering toolbox. And, what differs it from UX writing or standard copywriting is the aspects of relevance calculation that must be accounted for on a passage level. It’s not just chopping paragraphs into smaller paragraphs and using more headings and hoping for the best.
Consider this, no one can reliably determine that content is generative or not without watermarks. So, Google needing a more reliable signal leverages user interactions to determine whether content should continue to rank. The main attribute that yields better content performance is better structure irrespective of why you do it.
So, What Truly is Chunking?
Chunking is the action that RAG systems take with content when they capture it to prepare it for the retrieval process. Chunking is the act of breaking content into a series of components that can be individually retrieved based on how relevant they are to a prompt or user query.
This is also a function of dense retrieval which Google effectively announced when they revealed their implementation of Passage Indexing. In passage indexing, passages are embedded and stored and the query is too. Approximate Nearest Neighbor (ANN) searches are performed to pull the closest matching passages. This is one of the building blocks of RAG, the primary paradigm behind AI Search.
There are a variety of chunking strategies including, but not limited to semantic, layout-aware, fixed length token-size. Based on Metehan’s research into Google’s public Vertex AI offering, it suggests that theirs may be a combination of fixed length and layout aware with the cascading heading option.
So, we are using the same term to refer to both the action that the system takes to decompose content and the work that we’re doing to restructure the content so it’s easier to extract. I wanted to clarify that for people that look to invalidate meaningful discussion based on syntax and vocabulary.
Why is Chunking Different from Classic Content Optimization for SEO?
In classic SEO, content boundaries were defined largely by intuition and editorial convention. Even when using content optimization tools, the analysis was typically lexical and page-level, evaluating aggregate term usage rather than the relevance of individual passages. As a result, while pages may have been visually or structurally segmented, those segments were not deliberately optimized as independent units of meaning.
Chunking as an optimization tactic changes that. With clearer insight into how modern systems evaluate content at the passage level, we can now treat each chunk as a discrete relevance object. This makes it possible to intentionally shape structure, specificity, and context within each passage to influence how it is measured, compared, and selected. Instead of optimizing pages holistically and hoping relevance emerges, chunking allows us to precisely adjust content at the level where relevance is actually computed.
Insights that Dan Petrovic shared on the length of Google’s grounding chunks and how much of your content gets used after it makes it through filtering give us more clarity on the atomicity. We also know that the natural boundaries of a paragraph we create influences what is considered a chunk in these systems.
Historically, SEO treats the page as a single context window with no real measurable way to tell if your optimizations really did anything except for the rankings themselves. Sure, the various content optimization tools give you a lexical score, but nothing that aligns with the breadth of modern information retrieval. Chunking offers a direct feedback loop for the isolation and improvement of specific blocks of content and how you can influence how they perform in AI surfaces.
The Google-shaped Web
Much has been said about how the web has conformed to what performs best in Google. It’s expected when Google is the biggest referral channel. However, with the advent of generative AI, websites are no longer adapting to a single set of guidelines or incentives. Content is now shaped by multiple platforms and channels, including search engines, AI assistants, recommendation systems, and social feeds, each imposing different structural and semantic pressures on how information is created and how users react to it.
After two decades of being in the space, I can say definitively that statements like this are how Google keeps marketers as their unpaid workforce, nudging the web toward what works best for Google.
Googlers often speak as though they are merely extracting natural patterns from the web, positioning themselves as neutral observers. But they are not the Watchers. They are the Celestials. One watches without interference; the other designs systems that determine what survives. Google’s ranking and retrieval decisions have shaped the web for decades. Entire categories of sites have converged on similar layouts, headings, FAQs, and explanatory formats not because those patterns emerged organically, but because Google’s systems and PR consistently reinforced them.
It’s not that they “don’t want people to have to be crafting anything for search specifically.” It’s that they “don’t want people to have to be crafting things for search that take advantage of Google.”
What changes with generative AI is not that this influence goes away, but that it begins to fragment. Search is no longer only about ranking pages. It is about selecting, extracting, and recombining passages across sources. The incentives now favor content that can be easily segmented, understood in isolation, and reused by machines.
This is still a Google-shaped web. But the shape is starting to loosen, creating the conditions for what comes next.
Google still has search, but the agent-shaped web is emerging outside their control
As generative AI becomes a primary interface for information, the incentives that once forced publishers to conform to Google’s preferences are weakening. Users are getting answers without clicks, referral traffic is less reliable, and the payoff for strict adherence to SEO best practices and Google’s guidelines continues to shrink – even when Google results are a key input for the results. The result is a gradual but meaningful loss of influence over how content is structured and prioritized.
(sidebar: I’ll have you know I wrote that em dash myself in that last paragraph.)
In conversations with F100 clients, this shift shows up clearly. A few are moving towards abandoning search outright, but many are questioning how much effort it still deserves. Investment is spreading to other formats and channels, and teams are becoming more willing to deviate from rigid SEO conventions. Not because best practices are “wrong,” but because repeated testing shows their impact is increasingly marginal.
What’s emerging is an agent-shaped web. Content is no longer written primarily to satisfy a single ranking system, but to be usable by agents that retrieve, reason over, and recombine information across sources. These non-Google systems do not publish guidelines. They do not moralize tactics as “white hat” or “black hat.” They simply use the content that works. In that environment, many behaviors Google historically discouraged are not violations. They are advantages.
This is how Google’s grip loosens. When influence shifts from ranking pages to supplying agents with usable inputs, control fragments. The web stops optimizing for compliance and starts optimizing for utility.
That’s why, when I’ve asked Google engineers what to do beyond “make great content” to improve rankings, the answer has consistently been “nothing.” That response only holds if Google remains the central force shaping outcomes. In an agent-shaped web, it isn’t. And, that’s why you see them creating fast-follow protocols like A2A after MCP and UCP after ACP.
How Chunking Improves Relevance
We know that content structure influences people and they should be the primary audience for any content adjustment. Fundamentally though, Danny’s statements do not align with how the underlying technology functions.
Search engines and Large Language Models are both built on the vector space model. Relevance is a function of distance measures between queries/prompts and documents. Where search engines measure this to rank documents, LLMs use the plotted relationships to predict the next token.
The distance measures are the values that are compared to determine what to feed the LLM. In synthesis pipelines, there is a pairwise determination where passages are compared side by side to determine what gets sent to the language model. A longer piece of text that covers multiple subjects typically has lower relevance than a shorter piece of text that covers a single subject.
Let’s illustrate that idea with an actual example in my tool BubbaChunk (if you know you know).
The paragraph above targets the queries [machine learning] and the [data privacy]. When I generate embeddings for the queries and for that paragraph, using cosine similarity as my distance measure I get a 0.541 for [machine learning] and an 0.620 for [data privacy].
Now, let’s split that paragraph in two and not change anything else. The machine learning paragraph has now improved 19.24% to a 0.645 cosine similarity. The data privacy paragraph improved 1.29% to 0.627.
When compared to passages on both subjects, it now has a better opportunity to perform. Within an environment of a full page, other elements like the heading hierarchy and surrounding passages can be used to influence this. I can further improve the scores with semantic triples, entity salience, and so on, but in isolation, restructuring this content by changing its boundaries improves its retrievability.
Some folks may be invested in Google’s multi-aspect embedding technique MUVERA. BubbaChunk takes a similar approach and MUVERA uses Chamfer Similarity as its distance measure. Those are the Chamfer values you see in the screenshots. You’ll note that there are improvements to all distance measures when I’ve made this adjustment.
If you’re curious, adding the headings does improve the scores significantly. Below you’ll see adding the header to the “Data Privacy” paragraph improved cosine similarity another 17.54%.
No matter how you slice it, embed it or measure it, improving the structure of content yields better scores by machines and how it performs with people.
Structured Content is Better in Any Paradigm
Danny’s comments suggest that Google may eventually evolve its systems to discourage overt structuring techniques. That assumes structure is a temporary optimization tactic andsuggests we’re moving to a world where “high-quality writing” is a monolith that the algorithm will simply “understand.” Google’s own research direction, alongside adjacent work from Meta, Berkeley, and MIT, points in the opposite direction. As systems gain access to more context, memory, and recursion, structure becomes more important, not less. Across multiple papers, Google Research is clearly pursuing near-infinite context through memory rather than brute-force attention, and they are building atop the state of the art from other groups in the space.
Reviewing the state of the art, we find that Berkeley’s Ring Attention demonstrates how extremely long sequences can be processed by breaking them into rotating segments, where each segment attends locally while passing the summarized state forward. In that structure, the model does not need to see everything at once. It needs to preserve meaning across time. Systems like this rely on the continuity of information within a segment. By structuring content into atomic passages, you ensure each “rotating segment” contains a complete, unfragmented unit of meaning.
“Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention” formalizes this further by introducing compressive memory that allows models to retain and reuse information far beyond a fixed context window. You can’t compress a mess without losing the message. Atomic, legible passages act as high-fidelity signals that survive the compression process, ensuring your information is correctly retrieved later.
Meta’s MemWalker pushes in the same direction by organizing content into memory trees that can be traversed, revisited, and updated. Structure provides the branches. By defining clear boundaries and semantic anchors, you build a “map” that helps the agent navigate and reconstruct the mental model of your information.
These approaches make Google’s intent clear. The goal is not just larger windows. It is durable, near-infinite context.
(sidebar: Those last 4 paragraphs were originally a single paragraph. I split them up while editing to isolate each specific idea and align them with the images from the papers. That’s an example of chunking in action.)
MIT’s work on Recursive Language Models reaches the same destination through a different path. Rather than expanding context directly, RLMs decompose long inputs into smaller units and recursively invoke the model over the most relevant chunks. In effect, the model reasons over content iteratively, revisiting and recombining passages as needed. This reinforces the same reality. Passages are the unit of interaction.
DeepMind’s Mixture of Recursions (MoR) pushes this idea into the architecture itself. Instead of a fixed depth of computation, tokens are routed through different numbers of recursive steps. Some content is processed shallowly. Other content is revisited repeatedly. This is adaptive reasoning at the token level, and it further removes any illusion that content is consumed linearly or holistically. What matters is which pieces survive repeated passes through the system.
A common rebuttal is that we are entering an era of “Infinite Context.” With models like Gemini 3 Pro with a 1 million token context window, why bother chunking when the model can ingest the whole book?
The answer lies in inference cost and reasoning depth. The MoR paper reveals that not every token needs the same amount of “thinking.” In an agent-shaped web, computation is the new scarcity. Well-structured, atomic content allows the model’s ‘router’ to identify meaning quickly and exit the recursive loop early. Brute-forcing an unstructured 2-million-token wall of text is computationally expensive and prone to ‘context rot.’ If you want an agent to pick your content over a competitor’s, you should make it the path of least resistance. You don’t want to be just readable, but computationally efficient to digest.
At the bleeding edge, Google’s Nested Learning moves beyond retrieval entirely. With the HOPE architecture, passages are no longer just fetched as context. They are used as signals for “memory infusion,” updating the model’s inner loop. This is where control erodes most clearly. Once content moves from retrieval into synthesis and memory update, our influence largely ends. Just as we can influence how we rank for synthetic queries but not how answers are composed, we can influence which passages are legible and extractable, but not how they are ultimately weighted, combined, or remembered.
In this environment, atomic legibility is the only survival strategy. If a passage isn’t self-contained (meaning it lacks its own entity, context, and claim) it fails to “infuse” correctly. It becomes noisy data. Just as Infini-attention relies on “compressive memory” to store long-term state, your content must be compressible. You cannot compress a mess without losing the message. Each chunk must stand as a standalone signal so that when the agent tears the binding off so to speak, the individual page survives the transition from retrieval to synthesis.
Furthermore, the shift to an agent-shaped web isn’t limited to text. Agentic systems are increasingly multimodal, needing to reconcile text with images, charts, and tables. Without layout-aware structure, these relationships disintegrate during the retrieval process. By defining clear boundaries and semantic anchors, we aren’t just helping the model read; we are helping it reconstruct the mental model of the information. Structure is the glue that ensures a chart and its context remain unified when an agent retrieves them from a near-infinite context window.
None of this weakens the case for structure. It sharpens it. In every one of these systems, passages remain the atomic unit of meaning. Whether through attention, memory, or recursion, models operate on chunks, not pages.
Google still wants us to produce books: pages that can host ads, preserve attribution, and sustain the economics of the open web. Agentic systems read differently. They tear the binding off the book, ignore the table of contents, and pull out only the pages and paragraphs they need, sometimes revisiting them again and again. In that world, structure is no longer about presentation. It is about making meaning legible at the passage level.
Across every paradigm, from the first Google-shaped web to the looming agent-shaped web, the best we can do remains the same. But the ‘why’ has changed. We are no longer just formatting for ‘skimmability’ or ‘dwell time.’ We are formatting for Programmatic Legibility. We are building the API of meaning. By designing content so each chunk stands on its own with clear signals, we aren’t performing a workaround. We are ensuring our information survives the recursive, synthetic, and agentic loops that are now defining the web.
We’ll Keep Being the Signal through the Noise
There’s a lot of confusion right now as search and generative AI continue to blend. There are also a lot of people who don’t understand the nuances so they keep looking to shoehorn the changes into what they already do and know so they can feel superior or at least relevant.
This also makes it difficult for the community because the only reliable sources of information come from reading patents, white papers, playing with the platform’s public APIs, and then experimenting to see what works. Not everyone is capable of those things, nor do they have the time or wherewithal, so they do what I said above.
It’s unfortunate that Google continues to want to play the FUD game and contradict what we can see with our own eyes in their research, patents, and in how their systems react to changes. That behavior further reinforces that we are not partners in making the world’s information accessible. We are the unpaid extension of their workforce.
For these reasons, iPullRank will continue to do the work and the R&D and share what really works and why. Our AI Search Manual is an example of that. We’ll continue to support everyone striving to build in this new world and this will be one of the threads we continue at SEO Week in April.
So, get your ticket to SEO Week and hear from the sharpest minds on what’s actually working for AI Search.