
Introduction
If you have ever sat through a GEO (“generative engine optimization”) pitch in 2026, you have probably heard the same one-line promise three different ways: we'll get your brand cited by ChatGPT. That sentence sounds like GEO measurement. It is not. It is the AI-search equivalent of a 2010 SEO firm promising you a “page-one ranking” without telling you what query, what device, or what city — and without telling you whether anyone who clicks actually buys.
In a framework published May 18, 2026, in Search Engine Land, Paul DeMott argues that “AI search measurement in 2026 looks a lot like paid media in 2008. Everyone can see the impressions. Almost nobody can defend the revenue.” His prescription is a five-layer instrumentation stack that takes you from “did our brand appear” all the way to “did AI-influenced traffic close.” We have rebuilt that stack for the small-business reality our Northeast Indiana clients actually operate in — different budgets, different headcounts, different patience curves.
This is a hub post. Each layer below has a deeper Button Block companion piece — and most importantly, each layer has an honest disclosure of what its data can and cannot tell you in 2026.
Key Takeaways
- Single-metric GEO programs (appearances, citation counts, prompt coverage) are the new keyword-ranking trap. They look defensible in a slide and don't survive a CFO question.
- A defensible measurement stack has five layers: direct attribution, crawl-log diagnostics, share-of-voice plus AI interrogation, self-reported pipeline, and incrementality testing.
- Most small businesses can instrument layers 1 through 4 with free or near-free tools — GA4, server logs, a spreadsheet of prompts, and CRM form fields.
- Layer 5 (incrementality) is the most likely to mislead an SMB. Treat it as directional, not deterministic, and disclose its limits.
- None of these layers proves AI-search impact on its own. The point is to see whether they move together.
Why One-Metric GEO Measurement Keeps Failing
The cleanest signal that an early-stage discipline is over-claiming is that everyone reports the same single number. In 2012, that number was “keyword rank.” In 2026, it is “citations in ChatGPT” — or “appearances in Perplexity,” or “share of voice in Gemini.” The DeMott framework opens with a blunt line about this: “Citation share, presence rate, and AI Overview appearance counts are the new domain authority. They look defensible in a slide. For 95% of the agencies selling them, they aren't connected to pipelines in any rigorous way.”
We see the same trap on the SMB side. A Fort Wayne client will forward us a vendor dashboard showing they “appear in 27 ChatGPT answers this month, up from 19” and ask whether to renew. The honest answer is we don't know, because that number — by itself — has no relationship to whether anyone bought anything. We have written about this anti-pattern at length in why prompt volume is the wrong GEO metric: tracking 500 prompts where you appear is a vanity exercise unless the 10 prompts that drive your revenue are inside that 500. Search Engine Land's earlier reporting on the eight GEO metrics worth tracking makes a parallel point — the useful number is rarely the one a vendor leads with.
The five-layer framework reframes the question. Instead of “did we appear?” you ask five different questions in sequence:
- Did anyone click from an AI tool to our site? (Direct attribution.)
- Are AI systems crawling us enough to even know we exist? (Crawl-log diagnostics.)
- When AI does cite us, is the description accurate, and how does our share compare to competitors? (Share-of-voice plus AI interrogation.)
- Are customers telling us they used AI to find us? (Self-report.)
- When we run a real GEO program, do clients with the program meaningfully outperform clients without it? (Incrementality.)
No single layer answers “is GEO working.” Together, they triangulate a defensible story. DeMott's own framing of the goal is the right one for SMB readers: “When the layers move together, the story is real.”

Layer 1 — Direct Attribution: Did AI Traffic Land on Our Site?
Layer 1 is the simplest layer to start, and also the most misleading if you stop there. The job is to count human clicks from AI surfaces to your website — separated from the “Direct” bucket where GA4 stuffs everything it can't classify.
What you measure. Sessions and conversions where the referrer string contains a known AI host: chatgpt.com, chat.openai.com, perplexity.ai, gemini.google.com, copilot.microsoft.com, claude.ai, and the newer agentic browsers and search modes as they emerge. Most teams set this up as a custom channel grouping in GA4 plus a custom dimension on user-agent. We walk through the implementation step-by-step in our GA4 AI Assistant channel setup guide, which mirrors Google's GA4 channel grouping documentation.
Honest limitations. This is where the SEL article makes the strongest cautionary point: “Layer 1 is necessary, but it's the tip of an iceberg that's getting smaller every quarter.” DeMott cites a Loamly analysis of 446,405 visits in early 2026 finding that “70.6% of AI traffic in its dataset landed as Direct in GA4 by default.” Agentic browsers strip referrers. Some AI sessions never leave the model — they answer the question without ever clicking through. If layer 1 is your only signal, you are reading a thermometer that's missing two-thirds of its mercury.
Recommended cadence. We recommend reviewing your custom AI channel group weekly for the first ninety days (to confirm the regex captures the hosts you expect) and then monthly. Pair layer 1 with the Microsoft Clarity AI citations report we cover in the Clarity AI citations dashboard piece — Clarity surfaces the citing model directly, which is one of the few free ways to corroborate a referrer string. Microsoft's Clarity documentation explains how the AI citations feed is populated.
Layer 2 — Crawl-Log Diagnostics: Are AI Systems Even Reading Us?
If layer 1 is “did anyone click,” layer 2 is “did anyone read.” Before any AI surface can cite you, its bot has to fetch you. Server logs make that visible.
What you measure. AI bot hits in your access logs, separated into the three buckets DeMott names: training crawlers (GPTBot, ClaudeBot, anthropic-ai, CCBot, Bytespider), search-and-indexing crawlers (OAI-SearchBot, Claude-SearchBot, PerplexityBot, DuckAssistBot), and user-triggered fetchers (ChatGPT-User, Claude-User, Perplexity-User, MistralAI-User). The third bucket is the most operationally interesting — it represents a real user prompting an AI tool that then went to your page in real time.
DeMott reports several striking crawl-to-referral ratios for context. Cloudflare's June 2025 data showed roughly 73,000 Anthropic crawler hits per referral; OpenAI was around 1,700-to-1; Google's classic web crawler was about 14-to-1. SEOmator's Q1 2026 figures pushed ClaudeBot to almost 24,000-to-1 and GPTBot to about 1,276-to-1. The takeaway is not the precise ratio — it is that AI bots are reading orders of magnitude more pages than they are sending users. Crawl volume is an eligibility signal, not a demand signal.
Tools. A free path is to ship raw access logs into Claude or any general-purpose LLM and ask it to categorize by user-agent and verify against published IP ranges (OpenAI publishes searchbot.json and chatgpt-user.json). For mid-market teams, our log file analysis for AI crawlers piece walks through a more durable Screaming Frog plus BigQuery pipeline. Google's own Search Central documentation is the cleanest cross-reference for the Googlebot variants you'll see alongside the AI crawlers in your logs.
Honest limitations. Bot logs tell you “the building has visitors.” They do not tell you “the visitors are buying.” Many small businesses also discover, on first analysis, that their site is being crawled aggressively by AI bots that find very little crawlable content because the page is JavaScript-heavy.
Layer 3 — Share of Voice and AI Interrogation: When AI Talks About Us, What Does It Say?
Layer 3 is two layers braided together. The first asks how often your brand appears in AI answers across a defined prompt set; the second asks what AI actually says about your brand when it does mention you. We treat them as a pair because either one alone is dangerous.
Layer 3a — share of voice. You define a list of 10-15 prompts that matter to your business (queries a buyer would realistically ask an AI tool before contacting you) and poll multiple models — GPT, Claude, Gemini, Perplexity — at a fixed cadence. You count the percentage of answers that mention you versus a competitor set. Vendor tools in this space include Profound, AthenaHQ, Peec, Semrush AI Visibility, and Ahrefs Brand Radar; you can also script your own with the OpenAI and Anthropic APIs. DeMott is direct about the limits: “SOV alone is a vanity metric. It tells you whether you're appearing in answers, not whether anyone is buying.” And vendor counts diverge — he flags “wildly different counts across Profound, AthenaHQ, Otterly, Semrush, and Ahrefs Brand Radar” for the same query. Treat any single tool's number as a directional estimate, not a fact. Search Engine Land's earlier analysis of the four signals that now define visibility in AI search is the strongest companion read here.
Layer 3b — AI interrogation. The other half of layer 3 is asking AI tools to describe your brand. The SEL article suggests prompts like “Who is the ideal customer for [brand]?”, “What are [brand]'s strengths and weaknesses?”, “Why choose [brand] over its top three competitors?”, “What is [brand] known for in [industry]?” The output is a brand-reputation audit at AI-search-surface scale. DeMott's analogy is the one that resonates with us: “Imagine you sent a brand-new sales rep to a networking event with no briefing... AI is doing this on your behalf right now, at scale.” Search Engine Land's separate framing of GEO as a reputation problem, not a content problem reinforces why layer 3b matters at least as much as 3a.
What to do with the data. Layer 3 surfaces three actionable categories: (1) accuracy issues (the model is wrong about something basic — phone number, service area, pricing model), (2) framing gaps (it correctly summarizes a competitor's positioning but flattens yours), and (3) sourcing gaps (when asked for sources, it cites third-party listicles instead of your own site). The first is a content fix. The second is a positioning fix. The third is a structured-data fix — name the entity clearly with Schema.org Organization and FAQPage markup so the model can ground its answer in your site, not a directory listicle.

Layer 4 — Self-Report: Are Customers Telling Us They Used AI?
Layer 4 is the cheapest, most under-used measurement layer in the stack and the one we recommend every small business turn on this week, regardless of GEO program maturity. Add one form field. Ask one sales-call question. Tag one CRM property.
What you measure. A “How did you hear about us?” form field with explicit AI-tool options — ChatGPT, Perplexity, Gemini, Claude, Copilot, “other AI tool” — plus an open-text “What did you ask?” field. On the phone or in discovery calls, train sales to ask, “Did you use ChatGPT or another AI tool while you were researching this?” and log the answer in the CRM. Track which deals these leads come from, what they're worth, and how often they close.
Why this matters. DeMott reports the most useful finding in the entire framework here: “Self-reported attribution from forms and sales conversations consistently surfaces double-digit percentages of pipeline as AI-influenced, even when CRM source attribution shows under 1%. That delta is the dark funnel made visible.” That is the gap between what your analytics says happened and what your customers say happened. For most of our SMB clients, layer 4 is the first piece of evidence that GEO is doing anything at all — and it's free.
Honest limitations. Self-report is biased. People misremember which AI tool they used, conflate AI Overviews with classic Google search, or volunteer “ChatGPT” because it sounds modern. Cross-reference against layer 3a — if a self-reported AI-influenced deal involved a prompt where layer 3a shows your brand appears, that's two signals pointing the same direction. The triangulation only works if the form field and the CRM property survive every site update — instrument it once, then audit it quarterly.
Layer 5 — Incrementality: Would the Pipeline Have Arrived Anyway?
Layer 5 is the layer that gives small-business owners the most heartburn — and the one most prone to misuse. Incrementality testing asks: if we hadn't run a GEO program, would the same revenue have shown up anyway?
What you measure. A difference-in-differences view across a portfolio of clients (or, internally, across business units, regions, or time periods) where some have a full GEO program, some have a light version, and some have none. You match on pre-treatment covariates (industry vertical, starting traffic, starting pipeline, branded search volume) and track branded search and pipeline over a six-to-twelve-month window. The honest version of this analysis reports confidence ranges, minimum detectable effect, and null results when null results occur.
Why this is hard for SMBs. Most small businesses do not have the comparison set to run a credible incrementality test on themselves. Agencies and consultancies (including us) can run portfolio-level analyses across clients, but as DeMott warns, “once you stratify by vertical and starting size, your effective sample per cell drops fast” and “a properly run benchmark can still show zero measurable lift.” We recommend SMB readers treat any agency-published incrementality lift number with the same skepticism they'd bring to a paid-media case study — read the methodology, look for null results in the same report, and ask what the confidence bands were.
Our SMB recommendation. If you are a single business measuring your own GEO program, run a simpler version of layer 5: hold one segment of your prompt set static (a control set you do not actively optimize for) and instrument layers 1 through 4 across both your active and your control prompts. If your active set's metrics improve and your control set's metrics drift sideways, that's a directional signal. It is not deterministic proof. Layer 5 always carries the caveat DeMott names directly: “This is a benchmark study, not a clinical trial.”
A 5-Layer Scorecard You Can Copy into a Google Sheet
We use a single scorecard with our Fort Wayne and Auburn clients to track all five layers in one place. The columns:
| Layer | Metric we track | Tool | Cadence | What it tells us | What it does NOT tell us |
|---|---|---|---|---|---|
| 1 — Direct attribution | Sessions, conversions from AI hosts | GA4 custom channel | Monthly | Bottom-of-funnel AI clicks | The “dark funnel” of zero-click AI answers |
| 2 — Crawl-log diagnostics | Hits by bot category, weekly median | Server logs + Claude | Weekly (first 90 days), then monthly | AI eligibility and demand pressure | Whether bots are finding cite-worthy content |
| 3a — Share of voice | % of answers mentioning us across 10-15 prompts × 4 models | Spreadsheet or vendor tool | Monthly | Visibility trend over time | Whether the visibility is accurate |
| 3b — AI interrogation | Accuracy, ICP fit, sourcing of AI's brand description | Manual prompt set across 3 models | Monthly | Brand-narrative correctness | Whether the narrative drives revenue |
| 4 — Self-report | % of leads citing an AI tool, $ pipeline tagged AI-influenced | Form field + CRM property | Weekly review | Real customer recall | True share — self-report is biased |
| 5 — Incrementality | Pipeline trajectory of GEO-on vs GEO-off cohort | Portfolio matched comparison | Quarterly | Directional macro impact | Causal proof at a single-business level |
The Button Block recommendation is to instrument layers 1 through 4 well before you spend time on layer 5. Most of the SMB owners we work with cannot meaningfully act on layer 5 numbers until layers 1 through 4 are clean — and we have seen multiple cases where a vendor's “incrementality lift” headline number distracted from a layer-2 problem (the site was JavaScript-heavy, AI bots saw nothing, and no amount of layer-5 modeling changed that). Our companion piece on the eight GEO metrics we actually track is the metric-by-metric companion to this scorecard.


How the Framework Lands for a Fort Wayne or Auburn Small Business
The framework is universal, but small businesses in Allen County and DeKalb County operate under different constraints than enterprise teams — and the right starter version of the stack reflects that. A Fort Wayne dental practice does not need Profound, AthenaHQ, and a custom Anthropic-API SOV pipeline. It needs a working GA4 AI channel group, one form-field change, and a quarterly hand-checked prompt-set review.
Three concrete starter scenarios from clients we work with across Northeast Indiana:
- A Fort Wayne professional services firm runs layer 1 in GA4, asks one self-report question on every consultation request, and audits twelve AI interrogation prompts every ninety days. Total instrumentation cost: zero dollars, about two hours per quarter.
- An Auburn boutique e-commerce shop runs layers 1, 2, and 4 — GA4 channel groups, monthly log analysis through a script that hits the Clarity citing-model export, and a tagged “AI source” property on every Shopify order's notes field.
- A Northeast Indiana home-services contractor focuses on layer 4 first — the team rebuilt the “How did you hear about us?” intake form in two days and started catching AI-cited leads they had been blanket-bucketing as “Google” for a year.
None of these businesses can credibly attempt layer 5 alone. None of them needs to. The point of the framework is to know which layers you can defend with data and which you cannot, and to be honest with yourself about the difference.
What This Framework Does Not Measure
Two honest disclosures before we close. First, the five layers measure AI-search-surface activity. They do not measure brand sentiment outside AI surfaces — Reddit discussions, LinkedIn comments, organic social conversations about your category — which is increasingly where AI models pull training data from. A complete picture would pair the five layers with reputation monitoring outside the AI tool itself. Second, this framework will keep shifting. The hosts list in layer 1 changes every quarter. The bot categories in layer 2 change every six months. Build the instrumentation flexible enough to add new hosts and new bots, and revisit the framework every six months — not because it's wrong, but because the surface area it measures keeps moving.

Work With Us on Measurement
If you are a Northeast Indiana small business trying to figure out whether your GEO investment is actually doing anything, the 5-layer scorecard above is the place to start. We run instrumentation, scorecard setup, and quarterly review as part of our Answer Engine Optimization service and SEO service for Fort Wayne, Auburn, and Allen County clients. You do not need all five layers on day one. You need to know which one to start with — and what it will and will not tell you.
Ready to Build a Defensible GEO Measurement Stack?
We help Fort Wayne, Auburn, and Northeast Indiana small businesses instrument the 5-layer scorecard with the right tools at the right price point. Honest about what each layer can and cannot tell you.
Frequently Asked Questions
- How is GEO measurement different from traditional SEO measurement?
- Traditional SEO measurement is mostly built around one well-defined surface (Google’s organic blue links) and one widely available metric (rank for a given query). GEO measurement spans multiple AI hosts, multiple model versions, and a much higher rate of zero-click answers — so a single metric like "rank" no longer captures whether you’re visible. The five-layer framework exists because no single GEO metric is reliable on its own.
- Can a small business measure GEO without paying for a SOV tool?
- Yes. Layers 1, 2, 4, and the layer-3b interrogation step can all be instrumented with free tools — GA4, server access logs analyzed via a general-purpose LLM, CRM form fields, and a manually run prompt set. Vendor SOV tools speed up layer 3a, but they’re not required to get started. We recommend running layers 1 and 4 for ninety days before investing in any paid vendor.
- How often should we review the scorecard?
- Layer 1 and layer 4 are useful weekly during setup and then monthly. Layer 2 (crawl logs) is weekly for the first ninety days while you confirm bot identification, then monthly. Layers 3a and 3b are monthly. Layer 5 (incrementality) is quarterly at the earliest. Reviewing too often invites overreaction to short-term noise — most AI-search signal moves on a four-to-twelve-week lag.
- What’s the single biggest measurement mistake small businesses make in 2026?
- Treating "appearances" or "citation counts" as a stand-alone success metric, the way teams treated "keyword ranking" in 2010. Appearances tell you you’re eligible. They do not tell you anyone clicked, anyone bought, or anyone even read the AI’s full answer. The fix is to instrument at least layers 1, 2, and 4 alongside any vendor-supplied appearance count.
- Is layer 5 (incrementality) worth attempting for a single SMB?
- Probably not at the full difference-in-differences level. Most single small businesses do not have the comparison set to run a credible test. A simpler version — holding one set of prompts as a control and not actively optimizing for them, then watching whether layer 1 through 4 metrics for the active set outperform the control set — gives a directional signal without overclaiming.
- How long should a small business expect to wait before GEO investment shows up in the scorecard?
- In our experience, layer 2 (crawl frequency) and layer 3a (appearance share) move within four to eight weeks of consistent publishing. Layer 1 (referral traffic) and layer 4 (self-report) tend to take twelve to twenty-four weeks. Layer 5 takes six to twelve months and may still return a null result. Set timeline expectations accordingly with internal stakeholders before you start.
- Where should a Fort Wayne or Allen County small business start with this GEO measurement framework?
- In our experience with Northeast Indiana clients, the right starting pair is layer 1 (GA4 AI channel group) plus layer 4 (one "How did you hear about us?" intake-form change with explicit AI options). Both are free, both can be live within an afternoon, and both start producing useful data inside thirty days. Add layer 2 (server log review) only once you have a developer or agency comfortable with access logs — for many Fort Wayne and Auburn small businesses that means waiting until the next site update cycle.
Sources & Further Reading
- Search Engine Land: The 5-layer framework for measuring GEO performance — Paul DeMott, May 18, 2026.
- Search Engine Land: 8 GEO metrics to track in 2026 — May 7, 2026.
- Search Engine Land: Why GEO is a reputation problem — April 24, 2026.
- Search Engine Land: 4 signals that now define visibility in AI search — April 29, 2026.
- Google Search Central: Google Search documentation — canonical reference for Googlebot and indexing behavior.
- Microsoft Learn: Microsoft Clarity documentation — covers the AI citations feed and integration.
- Google: Google Analytics 4 support — channel grouping and custom dimension docs.
- web.dev: Measure performance with the Web Vitals — performance measurement methodology.
- Schema.org: Schema.org vocabulary — structured-data reference for Organization and FAQPage markup.
