
Introduction
A cryptocurrency news publisher watched its Brazilian site fall from 15,000–25,000 daily clicks to roughly 2,000–4,000 — and stay there for more than a year — after a botched domain migration, according to a 2026-05-12 Search Engine Land case study. The mechanism behind that collapse wasn't a Google penalty or a content quality problem. It was something quieter and easier to miss: thousands of pages returning HTTP 200 with nothing meaningful on them. Soft 404s.
Small business sites ship soft 404s all the time. A CMS upgrade silently strips a template. A faceted-filter URL on a Shopify store renders an empty product grid. A Next.js dynamic route falls back to a “no results” view. The page works for a human visitor — and for Google, it looks like a thin, low-value page that doesn't deserve to be indexed.
This playbook is the 60-minute version we run for clients. It works whether you're on WordPress, Shopify, Next.js, or anything else. You don't need a developer to start it — you just need a verified Google Search Console account and a willingness to look at the messy data. If you've been wondering why your organic traffic keeps drifting downward, the answer might be sitting in the Pages report right now. Our broader Where did my website traffic go? piece covers the AI Overviews angle; this one focuses on the indexing side that nobody talks about.
Key Takeaways
- Soft 404s are pages that return HTTP 200 but look empty or low-value to search engines. They quietly accumulate after CMS migrations, template changes, and parameter-URL bloat — and they waste the crawl budget your real pages need.
- The Search Engine Land case study tracked ~120,000 soft 404 pages across 13 country domains; soft 404 errors dropped 83% from peak after remediation.
- The fastest diagnostic is Search Console's Page Indexing report → filter by “Soft 404” and “Crawled — currently not indexed” — both are crawl-budget waste indicators.
- Common small business culprits: empty search results pages, faceted filter URLs, expired product pages on Shopify, paginated archives, and JavaScript-rendered pages that look empty to crawlers.
- Recovery isn't instant. The case study site needed ~12 weeks to drop soft 404s 69% after fixes shipped — and even then, traffic recovered only to 25–35% of pre-collapse forecasts.
What Is a Soft 404 and Why Does Google Deprioritize Your Whole Site Because of Them?
A traditional 404 is honest: the server says “this page doesn't exist,” and Google removes it from the index. A soft 404 is a page that lies. The server says 200 OK and serves you HTML, but the HTML contains nothing useful — an empty product listing, a “no posts found” archive, a thin landing page that looks like a placeholder. Google's Search Central documentation explicitly tells site owners to return a real 404 or 410 status code for pages that no longer have content, because the lie wastes crawl budget and creates indexing ambiguity.
The Search Engine Land case study quantifies how fast this compounds. At peak, the publisher had roughly 120,000 soft 404 pages across thirteen country domains — 90,400 on the main site alone — plus 513,369 pages stuck in the “Crawled — currently not indexed” bucket in Brazil. Google was visiting those URLs and choosing not to add them to the index, which signals that the crawl budget was being spent on URLs the algorithm did not believe were worth ranking. That kind of waste pushes legitimate pages further back in the crawl queue. New articles took 24 hours to get indexed at the worst of it.
For a small business with a few hundred pages, the volumes are smaller — but the mechanics are identical. If your /search/?q=blue-widgets URL renders an empty “no results” page and returns HTTP 200, that's a soft 404. If your /category/winter-coats archive shows the “no posts” template after you cleared seasonal stock, that's a soft 404. If your Shopify collection still exists but every product inside has been unpublished, that's a soft 404. The page works for a human; the page is invisible — or worse, suppressive — for organic search.
Why is this dangerous rather than just inefficient? Because the algorithm has finite trust in your domain. Every empty, thin, or auto-generated page it crawls is a vote that your site doesn't deserve full crawl coverage. That can pull down indexing of pages you care about. The Search Engine Land case study found that fixing the soft 404 pile let indexed pages in Germany roughly double — from ~150,000 to 370,748 — without a single line of new content being added.

How Do You Diagnose Soft 404s in Google Search Console in 60 Minutes?
The diagnostic flow lives almost entirely inside Google Search Console's Page Indexing report. Here is the order we run it for clients, including a small Fort Wayne service-business site we audited last quarter.
Step 1: Open the Pages report and sort by “Why pages aren't indexed.” The report groups every URL Google has discovered into reasons. The buckets you want to focus on are “Soft 404,” “Crawled — currently not indexed,” “Discovered — currently not indexed,” and “Duplicate without user-selected canonical.” Together those four buckets account for most crawl-budget waste in small business sites. Click into each one and look at the URL patterns — soft 404 buckets usually surface obvious culprits like /?s= search results, /tag/ archives, or /collections/all variants.
Step 2: Run the URL Inspection tool on three suspicious URLs from each bucket. The URL Inspection tool is the highest-signal check available in Search Console. Paste in a flagged URL, then click “Test live URL” → “View tested page” → “Screenshot” and “HTML.” If the rendered screenshot is blank, missing the main content, or shows a “no results” template, Google is rendering exactly what we expected — and it's classifying that render as soft 404 for a reason. Sometimes the issue is JavaScript: the page works in a browser but renders empty in the headless Chrome that Googlebot uses.
Step 3: Pull the Crawl Stats report under Settings. The Crawl Stats report shows how many requests Googlebot makes per day and which response codes it gets back. If your “Average response (ms)” is climbing while “Total crawl requests” stays flat, you're often spending budget on slow URL patterns — a common signal of parameterized soft 404 traps. The case study tracked crawl rate drops from 60,000–70,000 daily requests to 20,000–30,000, which was the first measurable signal that Google had lost faith in the domain. See also Google's crawl-budget management guide for larger sites.
Step 4: Search your own site with site: queries. A site:yourdomain.com inurl:? query in Google often surfaces the parameterized garbage your CMS is exposing — search results pages, sort/filter parameters, session IDs in URLs. If you see dozens of variants of the same canonical page, you have a duplicate-content + crawl-budget problem masquerading as a soft 404 problem.
Step 5: Cross-check Bing Webmaster Tools. Bing's Site Explorer tends to surface different indexing anomalies than Google, and it's free. We often find pages indexed in Bing but rejected by Google — the gap usually points to a JavaScript-rendering issue Google catches and Bing doesn't.
For a typical small business site with under 1,000 URLs, this five-step flow takes about an hour. The output is a list of URL patterns that are either soft 404s, crawl-budget waste, or both. That list is the input to the fix step.
What Causes Soft 404s on WordPress, Shopify, and Next.js Sites?
Each major small business platform ships its own flavor of soft 404, and the diagnostic shortcut is knowing which patterns to look for first. We see the same offenders repeatedly across Northeast Indiana clients and out-of-market builds.
| Platform | Common soft 404 sources | First fix |
|---|---|---|
| WordPress | /?s= search results pages, empty /category/ and /tag/ archives, expired LearnDash/WooCommerce product pages still resolving 200 | noindex search results via Yoast or Rank Math; 410 expired products; gate empty taxonomies |
| Shopify | /collections/all, /products/ for unpublished items, sort/filter URLs, vendor pages with no inventory | Use canonical tags pointing to live collections; redirect unpublished products; block filter parameters in robots.txt |
| WooCommerce | Out-of-stock products that auto-hide, sale-tag pages with no products, attribute filter URLs | 301 retired products to category; noindex empty attribute archives |
| Next.js / Headless | Dynamic routes returning fallback “not found” UI with 200 status, ISR pages serving stale empties, paginated routes past the last real page | Return notFound: true from getStaticProps; 404 invalid dynamic params at the route level |
| Squarespace / Wix | Hidden pages still resolving via direct URL, search results pages, empty blog tag pages | Manually 404 or noindex; check sitemaps for leaks |
The Next.js pattern is the one we see fixed least often. Modern frameworks default to soft 404s because returning a “not found” component with a 200 status code is easier than wiring up notFound: true properly. When we audit Next.js builds for clients we look for two specific signals: (1) the /404 route returning HTTP 200 when accessed directly, and (2) dynamic routes like /products/[slug] rendering a “Product not found” component without setting the right status code. Both are silent and both kill organic traffic over time. Our web performance optimization guide covers the related Core Web Vitals angle that often compounds the problem.
WordPress sites have a different recurring failure: the WPML or Polylang language switch creates duplicate URLs (/en/ versus /) where one variant ends up serving empty templates after a theme upgrade. The case study publisher hit this at scale on a domain migration; small business sites hit a smaller version of it constantly.

How a Fort Wayne Small Business Owner Can Run This Audit Without a Developer
We get asked this constantly: can a non-technical owner actually do a soft 404 audit, or does it require a developer? The honest answer is that the diagnostic phase is fully doable from the Search Console UI in an hour. The fix phase usually needs someone who can edit your CMS, but most small-shop CMSes give you the tools.
Picture a typical Auburn, Indiana small business — say a specialty retailer that runs a Shopify store and a WordPress blog. The owner notices her organic traffic has been drifting down for six months. She logs into Search Console, opens the Page Indexing report, and sees 142 URLs in the “Soft 404” bucket. About 90 of them are /search?q= URLs from her store's internal search. Another 30 are /collections/winter-2024 and similar seasonal collection URLs she emptied months ago. The remaining 22 are blog tag pages with one or zero posts.
That's a clean diagnosis with three concrete fix paths. In Shopify she can add a <meta name="robots" content="noindex"> to the search results template through theme code (or ask her theme support to do it). The seasonal collections she can either redirect (301) to the current season's collection or unpublish and let them return real 404s. The thin tag pages she can either consolidate (merge tag/winter-coats and tag/coats-winter into one canonical tag) or noindex if they're not getting traffic. The whole project is a couple of hours of work, mostly checkbox-style configuration inside the platforms, not custom code.
The local nuance that matters here: small business owners in Fort Wayne, Auburn, and the broader Allen County / DeKalb County area tend to run on Shopify, WordPress, or Squarespace, not headless stacks. That's good news. Those platforms expose noindex controls, sitemap controls, and redirect controls in plain UI. If your site is on one of them, you can ship most of the fix yourself. If you're on a custom or Next.js build, that's when we recommend looping in a developer — and at Button Block, that's where our web development team gets involved.
One honest caveat: the Search Engine Land case study makes clear that recovery is slow even when fixes ship correctly. The publisher needed ~12 weeks to drop soft 404 errors 69% in Brazil. And total traffic only recovered to 25–35% of the original forecast even after the indexing crisis was resolved. The site that loses 90% of its traffic doesn't get all of it back. The bigger lesson: fix this early. Soft 404 prevention is cheaper than soft 404 remediation by an order of magnitude.
How Do You Actually Fix Soft 404s Without Making Things Worse?
The fix flow has three priorities, in order: return correct status codes, reduce crawl-budget waste, and protect legitimate pages from getting reclassified. Skipping any of those in sequence is how teams accidentally deindex their own product catalog while trying to clean up soft 404s.
Priority 1: Return correct HTTP status codes. If a page genuinely has no content and no plan to get content again, return HTTP 410 (Gone) or 404 (Not Found) — both behaviors are documented in Google's HTTP and network errors guide. 410 tells Google “remove this permanently and don't bother re-crawling.” 404 tells Google “this isn't here right now” and the URL stays in the crawl queue longer. Use 410 for retired products and pages you'll never republish; use 404 for genuinely transient missing content.
Priority 2: Remove auto-generated empty pages or noindex them. Internal search results pages, sort/filter parameter URLs, and empty archive pages should not be in the index. Add <meta name="robots" content="noindex"> per Google's noindex documentation or block via robots.txt if you can't change the page template. Don't do both — robots.txt blocking prevents Google from seeing the noindex tag, which means the URL can stay in the index as a URL-only entry.
Priority 3: Reinforce canonicalization for variant pages. If you have duplicate variants of the same content (paginated archives, tracking-parameter URLs, sort variants), make sure each variant points its rel=canonical to the master version. The Search Engine Land case study found 2,532 “Alternate page with canonical tag” entries in Search Console — most of which were resolved correctly by tightening canonical implementations.
Priority 4: Plug the leak. Every fix above is reactive. The proactive piece is ensuring your CMS, theme, or build can't generate the same problem again. For WordPress that means a permanent noindex on search results, tag pages with fewer than X posts, and author pages with no published posts. For Shopify that means a robots.txt rule for filter parameters. For Next.js that means a code review of every dynamic route to confirm it returns notFound: true when the data isn't there. For all platforms it means a quarterly check of the Page Indexing report so the next pile of soft 404s gets caught at 5 pages, not 90,400.
We also recommend pairing this with a content decay audit. Thin and stale pages don't return soft 404s on their own — but they're adjacent to the same algorithmic judgment about whether a page deserves index space. The two clean-up flows reinforce each other.

How Do Soft 404s Affect AI Search and AEO Citations?
This is the question we get most from clients running answer engine optimization programs alongside traditional SEO. Short answer: soft 404s hurt AI citations roughly the same way they hurt Google rankings, but the mechanism is slightly different.
Google's AI Overviews and Search Generative Experience pull from pages that are indexed in Google's main index. If your page is sitting in the “Crawled — currently not indexed” bucket — which is what soft 404s and thin-content pages often look like — it cannot be cited in AI Overviews because it never made it into the candidate set. The case study publisher noted that their Google Discover share grew significantly only after the indexing crisis was resolved, which is the same dependency: AI surfaces sample from the indexed corpus, not from the crawled corpus.
For ChatGPT, Perplexity, and Claude with web search, the dependency is similar but not identical. Those systems use third-party crawlers and indexes (e.g., Bing for Copilot, various retrieval-augmented setups for the others) and the same soft 404 hygiene rules apply. If your page renders empty to the crawler — whether that's Googlebot, Bingbot, or GPTBot — it's not going to be retrieved or cited.
There's a second-order effect we see less discussed: soft 404 pages can poison the rest of your site's authority signals. Internal links pointing into thin/empty pages are essentially wasted authority — you're telling the algorithm to spend trust on a page that, on inspection, contains nothing. Cleaning up soft 404s by either fixing the content or removing the page reroutes that internal-link authority to pages that actually rank and cite. That's the part most “fix your soft 404s” guides miss.
The Small Business Soft 404 Prevention Checklist
The diagnostic-and-fix flow above is the reactive playbook. Once you've shipped the cleanup, the next step is making sure you don't re-accumulate the same pile in six months. Here's the recurring checklist we run with retainer clients:
- Monthly: Open the Page Indexing report in Search Console. Skim the “Why pages aren't indexed” section. Note any new URL patterns in Soft 404 or Crawled–not indexed. Investigate any pattern that appears in more than five URLs.
- Quarterly: Re-run a
site:yourdomain.comreview of indexed URLs. Look for unfamiliar parameter URLs, search-results URLs, or paginated archives that shouldn't be there. - Pre-launch (any CMS upgrade, theme change, or migration): Validate that 404 and 410 status codes still return correctly. Confirm sitemap.xml doesn't include URLs that have been removed. Run Screaming Frog or Sitebulb against staging before push.
- Post-launch: Watch the Search Console Pages report for two weeks. Any spike in soft 404s in that window means something in the migration broke status codes for a URL pattern. Fix immediately — recovery from a delayed catch is much slower.
- Annually: Audit redirects. 301 redirect chains accumulate over time and can themselves trigger soft 404 classifications if intermediate URLs serve thin content.
For most small business sites under 5,000 pages, that recurring cadence — about an hour per month plus a longer audit before any major CMS change — keeps the soft 404 pile under control indefinitely.

How Button Block Helps Fort Wayne and Northeast Indiana Sites Avoid Traffic Collapse
We've been running technical SEO audits for small businesses across Auburn, Fort Wayne, and the broader Allen County / DeKalb County corridor for years, and the soft 404 audit is the single highest-ROI check we do. It's free to start, requires no platform changes to diagnose, and the fixes are usually configuration work — not custom development. If your Search Console traffic has been drifting downward without an obvious cause, the Pages report is the first place we look.
If you'd rather not run this audit yourself, our SEO services team handles the full diagnostic flow — Pages report walkthrough, URL Inspection sampling, crawl-budget review, and a written remediation plan tied to your platform. For Next.js and headless builds where the fix involves code changes, we loop in our web development team directly. Most audits ship in under two weeks, and the fixes typically pay for themselves in restored organic traffic within a quarter.
Worried About Quiet Indexing Drift?
Button Block runs the 60-minute diagnostic on your live site, walks you through the soft 404 and crawl-budget buckets, and writes a fix plan tied to your CMS — no agency contract required to start.
Frequently Asked Questions
- What's the difference between a soft 404 and a regular 404?
- A regular 404 returns HTTP status code 404 (Not Found) and tells Google the page genuinely doesn't exist. A soft 404 returns HTTP 200 (OK) but the page content is empty, thin, or looks like a "no results" template. Google classifies it as a soft 404 because the URL technically works but doesn't serve real content — which the algorithm treats as crawl-budget waste rather than a legitimate page.
- How long does it take to recover from a soft 404 problem?
- In the Search Engine Land case study, soft 404 errors dropped 69% in Brazil within twelve weeks of remediation. Full traffic recovery is slower and often incomplete — the publisher reached only 25–35% of forecasted traffic even after fixing the indexing issues. For smaller sites with a few hundred URLs, expect Search Console to start reflecting the fix within two to four weeks, with full reclassification taking one to three months.
- Can soft 404s hurt my AI Overviews and ChatGPT visibility?
- Yes. AI Overviews pull citations from Google's main index; if your page is stuck in "Crawled — currently not indexed" (often a soft 404 cousin), it won't be in the candidate pool that AI surfaces sample from. The same dependency applies to ChatGPT, Perplexity, and Bing-backed AI features. Fixing soft 404s is a prerequisite for getting cited in AI answers, not just for ranking in classic results.
- Should I noindex or 404 a page that has no content?
- If the page may have content again later (a seasonal collection, an out-of-stock product you'll restock), add noindex and keep the URL alive. If the page is permanently gone, return HTTP 410 (Gone) so Google removes it faster than a 404. Don't combine noindex with robots.txt blocking — robots.txt prevents Google from seeing the noindex tag, which means the URL can stay in the index as a URL-only entry.
- How do I find soft 404s in Google Search Console quickly?
- Open Search Console → Indexing → Pages → scroll to "Why pages aren't indexed" → click the "Soft 404" row. Search Console shows up to 1,000 example URLs per reason. Sort by URL pattern and look for repeating patterns like /?s=, /collections/, or /tag/. Use the URL Inspection tool on three to five examples per pattern to confirm Google is rendering them as empty.
- Are Shopify and WordPress soft 404s the same problem?
- The underlying mechanism is identical — pages returning 200 with no real content — but the common sources differ. WordPress soft 404s usually come from internal search results, empty tag archives, and expired WooCommerce products. Shopify soft 404s usually come from /collections/ URLs with no published products, filter-parameter variants, and unpublished /products/ pages. The fix tools also differ: WordPress relies on SEO plugins like Yoast for noindex control, while Shopify needs theme code edits or robots.txt rules.
- Do soft 404s hit Fort Wayne and Northeast Indiana small business sites differently?
- The mechanics are identical, but the common causes skew toward the platforms small Auburn, Fort Wayne, and Allen County businesses actually use — Shopify storefronts, WordPress + WooCommerce stacks, and Squarespace marketing sites. The most frequent triggers we see across Northeast Indiana clients are seasonal Shopify collection pages left up after a clearance, empty WordPress tag archives from a long-since-retired blog series, and WooCommerce products marked out-of-stock instead of properly retired. A monthly Pages report check usually catches new soft 404 patterns at 5–10 URLs instead of 500–1,000.
Sources & Further Reading
- Search Engine Land: Indexing Issues That Caused a 90% Traffic Drop — and How We Fixed Them — The 2026-05-12 case study that motivates this playbook.
- Google Search Central: Soft 404 errors documentation — Official guidance on why and how to return real 404 or 410 status codes.
- Google Search Console Help: Page Indexing report — The primary diagnostic surface for soft 404 buckets.
- Google Search Console Help: URL Inspection tool — Used in Step 2 of the diagnostic flow to confirm Google's render.
- Google Search Central: Large site owner's guide to managing your crawl budget — Reference for why soft 404s compound the crawl-budget problem.
- Google Search Console Help: Crawl Stats report — Used in Step 3 of the diagnostic flow.
- Bing Webmaster Tools: Site Explorer documentation — Free cross-check for indexing anomalies Google may not surface.
- Google Search Central: Block indexing with noindex — Reference for safe noindex implementation.
