Google's LLM Patent: Teaching AI Who You Are

A Google patent describes AI building an internal model of what your business is and who it serves. Here's how to make your identity legible to AI search.

Ken W. Button - Technical Director at Button Block
Ken W. Button

Technical Director

Published: June 24, 202611 min read
A small business owner at a desk studying a glowing web of connected company identity data points, showing how AI search learns who a business is

Introduction

For most of the last two decades, the job of SEO was simple to state, if hard to do: rank a page for the keywords your customers type. A recent Google patent suggests the goal is shifting underneath us. As Search Engine Land reported on June 22, 2026, Google's “Data extraction using LLMs” patent describes a system that reads your website and other public sources to synthesize an understanding of a particular entity — what your business is, what it does, and who it serves.

That reframes the work. The objective is no longer only “rank for a phrase.” It is teaching AI who you are — giving the model an accurate, consistent identity it can recognize and reproduce when someone asks a question. This is the same shift we've tracked in our answer engine optimization guide: visibility increasingly depends on whether AI systems understand your business well enough to mention it.

Before we go further, one honest caveat that we'll return to at the end: a patent describes what a company could build, not what is necessarily running in live ranking today. Treat everything here as directional — a strong signal about where AI search is heading, not a guaranteed ranking recipe.

Key Takeaways

  • A Google patent describes AI reading your site and public sources to build a structured “characterization” of your business as an entity.
  • The patent frames this as interpretation, not copying — the system forms conclusions about who you are, not a verbatim index of your pages.
  • This pushes the SEO goal from “rank for keywords” toward “make your identity consistent and legible” across the web.
  • The strongest identity signals are an entity home, structured data, sameAs links to Wikipedia/Wikidata, and consistent details across third-party sources.
  • Content that AI can extract is dense, self-contained, and uses the same terminology your industry agrees on.
  • A patent is not confirmation of live ranking behavior — build for durable clarity, not a single algorithm.
Abstract digital illustration of a website being interpreted into a hierarchical tree of entity attributes, representing Google's LLM patent characterization process

What does Google's LLM patent actually describe?

The patent at the center of this discussion is, in Google's own framing, a method to “extract content from a website or domain and other public sources to synthesize an understanding of a particular entity.” That word entity is doing a lot of work. Google defines an entity broadly — people, companies, businesses, places, objects, and concepts — and the system's job is to interpret information about that entity rather than simply file it away.

The process the filing describes is roughly this: identify a domain and the entity associated with it, gather information from the pages on that domain, and process it through a large language model. The output is what the patent calls a characterization — and here's the part worth sitting with. According to Search Engine Land's reporting, the characterization is “an interpretation of the extracted first content and extracted second content rather than a verbatim duplication of the extracted content.” In plain English: the model isn't quoting your About page. It's drawing conclusions from it.

A separate Search Engine Land analysis by Olaf Kopp, examining what Google and Microsoft patents reveal about generative engine optimization, adds useful mechanical detail. Kopp describes the same “Data extraction using LLMs” patent as treating entire websites as a single input to generate a unified brand characterization, organized hierarchically — parent nodes for broad categories, leaf nodes for specific details. The implication is that scattered, contradictory information across your site doesn't just confuse a visitor. It produces a muddy characterization at the source.

If you've read our explainer on how your brand becomes math in embedding space, this should feel familiar. The patent is a concrete mechanism for something AI search has been doing implicitly: converting “your business” into a structured internal representation that the model reasons over.

Why does “teaching AI who you are” replace “ranking for keywords”?

Keyword SEO treats search as string matching: a query is text, a page is text, and the engine finds overlap. Entity-based search treats the query as a question about a thing. As OutpaceSEO's guide to entity-based SEO puts it, citing Google's own definition, an entity is “a thing or concept that is singular, unique, well-defined, and distinguishable” — mapped to a unique identifier rather than a text string. When someone searches “Apple,” entity context decides whether they mean the fruit or the technology company.

That distinction matters more in AI search because the engine no longer just returns links. It composes an answer, and to do that it has to know things about the businesses it might mention. If the model's characterization of you is thin or inconsistent, you don't rank poorly — you may not be a candidate for the answer at all.

The scale here is real, even if the exact figures vary by source. OutpaceSEO reports AI Overviews now trigger on roughly 18.76% of US search keywords and notes that ChatGPT processes around 2.5 billion daily prompts. The same guide flags a sobering gap: fewer than 25% of the most-mentioned brands are also the most-sourced in AI platforms. Being talked about isn't the same as being understood well enough to be cited. (Treat these as reported third-party figures, not Google-confirmed numbers.)

Two marketing colleagues comparing a scattered list of keywords against a single connected entity diagram on a whiteboard during a strategy session

There's a second mechanism worth naming. Kopp's patent review describes “query fan-out,” where AI systems deconstruct one ambiguous question into several specific intents before retrieving anything. A clear entity identity helps the model match you to the right intent instead of guessing. This is the practical heart of the difference between GEO, AEO, and LLMO: all three ultimately depend on the engine knowing who you are.

What signals actually teach AI search your identity?

If the goal is a clean, consistent characterization, the work is mostly about removing ambiguity. These are the signals that carry the most weight, drawn from how knowledge bases and structured data actually operate.

SignalWhat it doesEffortHonest caveat
Entity homeA single authoritative URL (homepage or About page) that acts as the central node for your identityLowOnly works if the rest of your site agrees with it
Structured data (schema)Machine-readable labels that tell engines what your content and organization areLow–mediumMarkup must match visible content or it's ignored
sameAs linksConnects your entity to authoritative references like Wikipedia, Wikidata, official profilesVery lowUseless if the linked profiles are abandoned or wrong
Wikidata / Wikipedia presenceA structured, globally recognized identifier for your entityLow–highWikipedia requires notability; Wikidata does not
Consistent NAP & detailsName, address, phone, and descriptors that match everywhereMediumOne stale directory can fracture the entity
Third-party corroborationIndependent mentions that confirm what you say about yourselfHighRequires real PR and relationships, not link buys

A few of these deserve unpacking. Google's own structured data documentation is explicit that markup isn't only about rich results: “Google uses structured data that it finds on the web to understand the content of the page, as well as to gather information about the web and the world in general, such as information about the people, books, or companies that are included in the markup.” That last clause — companies included in the markup — is precisely the entity-building the patent formalizes.

The sameAs property is the cheapest high-leverage move most businesses skip. Schema.org defines it as the “URL of a reference Web page that unambiguously indicates the item's identity. E.g. the URL of the item's Wikipedia page, Wikidata entry, or official website.” It's a direct way to say “this business is that known entity,” collapsing the disambiguation guesswork.

Wikidata is the structured backbone many of these systems lean on — “a free, collaborative, multilingual, secondary knowledge base, collecting structured data,” where every item gets a unique Q-number (the project's own example is Douglas Adams, Q42). A Wikidata item gives AI a stable, machine-readable anchor for your identity. We walk through the technical mechanics of all of this in our piece on making your brand machine-readable.

Close-up of hands at a laptop aligning matching business name, address, and contact details across several browser windows for entity consistency

Consistency is the unglamorous multiplier. As the DigitalApplied entity SEO guide notes, AI systems “actively seek consistent, verifiable data points across multiple sources before citing you” — and when your LinkedIn says “Acme Software Inc.,” your site says “Acme,” and a directory says “Acme Software,” models may treat those as different entities. The same guide reports brand mentions correlating with AI Overview visibility at roughly 0.664 versus 0.218 for backlinks (a third-party correlation figure, not causation), which lines up with the broader shift away from links and toward corroborated identity.

How do you write content an LLM can actually read?

Even a perfectly modeled entity has to be expressed in pages an AI can extract. Search Engine Land's playbook for machine-readable content, by Myriam Jessier, is blunt about the constraint: LLMs work within a limited “grounding budget” of roughly 1,900 words per query, and individual pages get only a slice of that. The article reports that “pages under 5,000 characters get about 66% of their content used. Pages over 20,000 characters? 12%.”

The takeaways are practical:

  • Write self-contained sentences. Every sentence should survive on its own, without leaning on the previous one or a vague “it.” Name the entity explicitly.
  • Lead with the answer. Jessier describes an “AI inverted pyramid” — open a section with a dense 40–60 word declarative statement that contains your core claim and specifics, not a slow windup.
  • Use consensus terminology. Kopp's patent review notes a “weighted answer terms” mechanism: accurate sources tend to use the same vocabulary other experts use on a topic. Inventing your own jargon works against you.
  • Structure for extraction. Jessier reports clear headings can improve semantic relevance by up to 17.54%, and that LLMs pull most reliably from the beginnings and endings of passages.

This is where technical groundwork pays off. If AI crawlers can't reach or parse your content, none of the above matters — which is the case we made in our guide to LLMs.txt and AI discoverability. And the deeper point holds: clarity for machines starts with clarity for yourself. You can't teach AI a crisp identity you haven't actually defined, which is the argument behind brand clarity being the new SEO.

What's the local angle for Northeast Indiana businesses?

For small and mid-size businesses across Fort Wayne, Allen County, DeKalb County, and the broader Midwest, the entity-identity shift is more opportunity than threat. National competitors often have louder signals — more mentions, bigger budgets — but those signals are frequently inconsistent across hundreds of locations and listings. A focused local business can present a cleaner, more coherent entity than a sprawling national brand, and a clean entity is exactly what the patent rewards.

Exterior of a Northeast Indiana main-street small business storefront on a clear day, representing local companies building a clear identity for AI search

The most common local failure point is also the most fixable: inconsistent business details. A practice that lists one address on Google, a slightly different name on Facebook, and an old phone number on a directory is actively teaching AI that there are three fuzzy entities instead of one trustworthy one. We've detailed how to lock this down in our guide to NAP consistency for AI bots. For a Fort Wayne contractor, accountant, or clinic, getting name, address, phone, and service descriptions to match everywhere is high-leverage, low-cost work — and it directly improves the characterization an AI builds of you.

How should you act on a patent — without overreacting?

Here's the discipline we'd recommend: treat the patent as a map of intent, not a confirmed feature. Patents describe what an organization has the right to build; many never ship, and Google has not confirmed that this exact system drives live rankings. Anyone promising guaranteed AI visibility from “the patent” is overselling.

What makes this guidance safe to follow anyway is that none of it is speculative on its own. Defining your identity clearly, marking it up with structured data, linking it to authoritative references, keeping your details consistent, and writing extractable content were all good ideas before this patent surfaced. They're durable practices that help with traditional search, AI Overviews, and chat-based assistants alike. The patent is useful mainly because it explains why they work — it shows the machinery underneath. Build for an accurate, consistent identity, and you're aligned with where AI search is going regardless of which specific patent becomes a product.

A small business team gathered around a table reviewing a calm dashboard of their brand identity signals during an AI search strategy session

Make your business legible to AI search

Teaching AI who you are is genuinely technical work — auditing your entity signals, fixing inconsistencies across the web, implementing the right structured data, and rewriting key pages so machines and people both understand them. It's also the kind of work that compounds quietly: every fix makes your characterization a little sharper.

Ready to make your identity unmistakable?

Our AEO services are built to do exactly this for small and mid-size businesses in Fort Wayne and across Northeast Indiana. We start with an entity audit — what AI currently “thinks” you are versus what you actually are — and close the gaps from there. If AI search is starting to shape who finds you, let's talk.

Frequently Asked Questions

It is a patent filing that describes a system for reading a website and other public sources, then using a large language model to synthesize an understanding of the entity behind that domain. Per Search Engine Land’s reporting, the output is an interpretation of your content—a structured "characterization" of who you are—rather than a verbatim copy of your pages.
No. Keywords still describe what people search for and remain useful. The shift is additive: alongside ranking for terms, you now also need a clear, consistent entity identity so AI systems understand and can accurately represent your business. Think "keywords plus identity," not "identity instead of keywords."
An entity is a uniquely identifiable thing or concept—a business, person, place, or product—that search systems map to a unique identifier rather than treating as a text string. Google’s definition, cited by OutpaceSEO, is "a thing or concept that is singular, unique, well-defined, and distinguishable." Entity SEO is the practice of making your business one of those clearly defined things.
The highest-leverage signals are a single authoritative "entity home" page, accurate structured data (schema), sameAs links to references like Wikipedia and Wikidata, and consistent business details—name, address, phone, and descriptions—across every platform. Independent third-party mentions that corroborate your claims add further trust.
Be measured. A patent shows what a company could build, not what is confirmed to run in live ranking. Fortunately, the actions it points to—clear identity, structured data, consistency, extractable content—are durable best practices that help across traditional and AI search, so following them is low-risk even if this specific patent never ships as a product.
By being cleaner, not louder. National brands often have inconsistent signals spread across many locations. A focused local business can present one coherent identity—consistent NAP, clear structured data, an authoritative entity home—which is exactly what AI systems reward. Start by making your business details identical everywhere they appear online.
What is Google’s "Data extraction using LLMs" patent about?
It is a patent filing that describes a system for reading a website and other public sources, then using a large language model to synthesize an understanding of the entity behind that domain. Per Search Engine Land’s reporting, the output is an interpretation of your content—a structured "characterization" of who you are—rather than a verbatim copy of your pages.
Does this patent mean keyword SEO is dead?
No. Keywords still describe what people search for and remain useful. The shift is additive: alongside ranking for terms, you now also need a clear, consistent entity identity so AI systems understand and can accurately represent your business. Think "keywords plus identity," not "identity instead of keywords."
What is an entity in SEO?
An entity is a uniquely identifiable thing or concept—a business, person, place, or product—that search systems map to a unique identifier rather than treating as a text string. Google’s definition, cited by OutpaceSEO, is "a thing or concept that is singular, unique, well-defined, and distinguishable." Entity SEO is the practice of making your business one of those clearly defined things.
What are the most important signals for teaching AI who I am?
The highest-leverage signals are a single authoritative "entity home" page, accurate structured data (schema), sameAs links to references like Wikipedia and Wikidata, and consistent business details—name, address, phone, and descriptions—across every platform. Independent third-party mentions that corroborate your claims add further trust.
Should I change my whole strategy because of one patent?
Be measured. A patent shows what a company could build, not what is confirmed to run in live ranking. Fortunately, the actions it points to—clear identity, structured data, consistency, extractable content—are durable best practices that help across traditional and AI search, so following them is low-risk even if this specific patent never ships as a product.
How can a small Fort Wayne business compete on entity signals?
By being cleaner, not louder. National brands often have inconsistent signals spread across many locations. A focused local business can present one coherent identity—consistent NAP, clear structured data, an authoritative entity home—which is exactly what AI systems reward. Start by making your business details identical everywhere they appear online.

Sources & Further Reading