By SitemapFixer Team
Published April 2026

Sitemap Audit Case Study: 47 Blocked Pages and How We Fixed Them

Run this audit on your own site in 60 seconds — free.Analyze My Site Free

This is a walkthrough of a real sitemap audit we ran on a composite SaaS marketing site — a B2B company with roughly 380 indexed pages across a marketing site, a docs subdirectory, and an integrated blog. The site had been live for three years with two CMS migrations in that time. Organic traffic had plateaued for six months despite consistent content production.

When we pulled the sitemap, it listed 312 URLs. After the audit, 47 of those URLs had problems serious enough to affect indexing. Here is what we found, what the impact was, and exactly what we changed.

The Starting Point: What the Sitemap Looked Like

The site used WordPress with Yoast SEO generating the sitemap automatically. The sitemap index at /sitemap_index.xml pointed to four child sitemaps: post, page, product (a legacy taxonomy), and category. Total declared URLs: 312.

The first thing we did was fetch every URL in the sitemap and record its HTTP status code, its canonical tag, its robots meta tag, and whether it redirected. We used a script that hit each URL with a raw HTTP GET and stored the response headers. That crawl took about four minutes for 312 URLs.

Finding 1: 19 URLs Returning 301 Redirects

19 of the 312 sitemap URLs — 6.1% — returned a 301 status code instead of 200. They were all old blog post slugs that had been changed during the second CMS migration eight months prior. The redirects were working correctly in the sense that users landed on the right pages, but the sitemap was still listing the old URLs.

Why this matters: Google's documentation is explicit — sitemaps should only contain URLs that return 200. When Googlebot finds a 301 in a sitemap, it follows the redirect and eventually figures out the destination. But the old URLs stay in the sitemap, wasting crawl budget and confusing the indexing signal. In this case, GSC showed the old URLs as "Crawled — currently not indexed" for several of them, meaning Google was treating the redirect destination as a separate candidate for indexing rather than confidently consolidating signal onto the final URL.

Fix: Updated the Yoast sitemap to reflect current slugs. For any URL where the redirect chain was longer than one hop (three cases), we also cleaned up the intermediate redirect to shorten it to a direct 301.

Finding 2: 11 Pages with Noindex + Sitemap Conflict

11 URLs in the sitemap had <meta name="robots" content="noindex"> in their HTML head. This is a direct contradiction: the sitemap is telling Google "crawl this," while the page itself is telling Google "do not index this."

These fell into two groups. Seven were tag archive pages (e.g., /tag/webinar/) that a developer had correctly set to noindex during a thin-content cleanup — but nobody had updated the Yoast sitemap settings to exclude those taxonomies. Four were landing pages that a marketer had set to noindex experimentally ("to keep them out of organic for a bit") and then forgotten about for over a year.

Why this matters: Google handles this conflict by generally respecting noindex and ignoring the sitemap inclusion, but the inconsistency costs crawl budget and creates noise in GSC. All 11 appeared in GSC under "Excluded — noindex tag" — meaning Google was spending crawl capacity discovering and re-verifying pages it would never index.

Fix: For the tag archives — excluded the taxonomy from Yoast's sitemap settings entirely. For the forgotten landing pages — two were restored to indexable (they had commercial value), two were formally excluded from the sitemap and left noindexed.

Finding 3: 9 Staging Environment URLs in the Production Sitemap

This one was the most surprising. Nine URLs in the sitemap pointed to staging.example.com rather than example.com. They were all in the category sitemap and traced back to a partial database restore that had been run during the second CMS migration — category term URLs had been imported with absolute URLs from the staging environment and never corrected.

The staging domain was password-protected (HTTP 401), so all nine returned a 401 when fetched. GSC showed them as "Server error (5xx)" — actually incorrect, but the result was the same: Google could not access them and was not indexing the category pages.

Fix: Direct database query to update the term metadata URLs in WordPress. After correcting the URLs, we force-regenerated the sitemap and verified all nine now returned 200 on the production domain.

Finding 4: 8 Pages with Canonical Pointing Off-Page

Eight pages in the sitemap had canonical tags pointing to a different URL — not a self-referential canonical. In six of these cases, the canonical pointed to a near-duplicate version of the page with a different slug that no longer existed (404). In two cases, the canonical pointed to the homepage, which appeared to be a Yoast misconfiguration where a page had been marked as a "noindex" at some point, the canonical had been manually cleared, and then a default fallback had applied.

Why this matters: A canonical pointing to a 404 tells Google: "the preferred version of this page is a page that does not exist." Google usually ignores broken canonicals and falls back to crawling the URL it found, but it introduces ambiguity into which URL accumulates ranking signal. These eight pages showed notably lower impressions in GSC relative to their content quality and internal link count.

Fix: Updated all eight to self-referential canonicals. For the two with homepage canonicals, we also filed those as a Yoast configuration issue to prevent recurrence.

The Before/After: What Changed in GSC

We pushed all fixes in a single deployment and resubmitted the sitemap in GSC. Over the following six weeks, we tracked the GSC coverage report:

  • "Valid" URLs in sitemap: went from 241 to 301 (+60, reflecting the corrected staging URLs, the resolved noindex conflicts, and the corrected canonicals being picked up)
  • "Crawled — currently not indexed": dropped from 34 to 11
  • "Excluded — noindex": dropped from 19 to 8 (the eight we intentionally kept noindexed)
  • Organic impressions (90-day period): up 22% vs. the prior 90-day period, with clicks up 18%

The category pages recovered the most noticeably. Four of the nine that had been pointing to staging were previously ranking on page 4–6 for their target keywords; within five weeks of the fix, three of them moved to page 2–3.

How to Run This Audit Yourself

The methodology is straightforward. Fetch your sitemap index, enumerate every URL across all child sitemaps, then for each URL record: HTTP status code on final response, the canonical tag value from the HTML, the robots meta tag value, and whether the domain in the URL matches your production domain.

Flag any URL that:

  • Returns anything other than HTTP 200
  • Has a noindex robots directive
  • Has a canonical pointing to a different URL than itself
  • Contains a domain name that is not your production domain
  • Has a lastmod date that is more than two years old (often signals orphaned or stale content worth reviewing)

Cross-reference any flagged URLs with GSC's Coverage report to understand which ones Google has actually tried to index and what status it assigned them. The combination of your sitemap data and GSC data gives you a complete picture of what is broken and what priority to assign each fix.

The whole process on a 300-URL site takes about two hours manually. On a 3,000-URL site, you need a tool. SitemapFixer runs this exact analysis automatically — it fetches every URL in your sitemap, checks status codes, canonical consistency, and noindex conflicts, and surfaces only the issues that affect indexing.

Run this audit on your own site
Free — checks every URL in your sitemap for status, canonical, and noindex issues
Analyze My Site Free

Related Guides

Is your sitemap hurting your Google rankings?
Check for free →