By SitemapFixer Team
Published April 2026 · 9 min read

Non-Canonical URLs in Your Sitemap: Why It Happens and How to Fix It

Check your sitemap for non-canonical URLs instantlyAnalyze My Site Free

A non-canonical URL in your sitemap is a URL that your own canonical tags say is not the preferred version of a page. It sounds contradictory — and it is. Your sitemap tells Google "this URL is important, please crawl it," while your canonical tag on that same page says "actually, the real version of this content lives elsewhere." Google encounters this conflict far more often than most SEOs realize, and it's one of the most reliable causes of the frustrating "Alternate page with proper canonical tag" status in Google Search Console.

This isn't just a housekeeping issue. When a significant portion of your sitemap points to non-canonical URLs, you're actively misdirecting Googlebot toward pages that won't be indexed, burning crawl budget that could go to content that actually matters.

What Actually Happens When Google Finds a Non-Canonical URL in Your Sitemap

Google's sitemap documentation is explicit: URLs in a sitemap should be the canonical versions of those pages. When Google crawls a sitemap URL and finds a canonical tag pointing elsewhere, it has a few options, and none of them are what you want:

  • It may follow the canonical tag and index the canonical URL instead. This sounds okay but means your sitemap entry effectively redirected Googlebot to a different URL — an indirect, confusing signal.
  • It may ignore the canonical tag and attempt to index the sitemap URL as-is, creating a duplicate or near-duplicate in the index.
  • It may skip the sitemap URL entirely and classify it as "Alternate page with proper canonical tag" — which is what you see in GSC when Google agrees with the canonical and doesn't index the URL you submitted.

In practice, Google usually does a version of the first or third option. The result is a sitemap that gets processed but produces no indexing benefit for the submitted URLs.

Root Cause 1: Trailing Slash Inconsistency

The most common source of non-canonical sitemap entries. Your sitemap generates URLs one way (with or without a trailing slash) and your pages declare canonicals the other way. For example:

  • Sitemap: https://yourdomain.com/blog/my-post
  • Canonical on that page: https://yourdomain.com/blog/my-post/

These are technically different URLs. Google may treat them as the same, or may not. The canonical tag is explicit — it says the trailing-slash version is preferred. Your sitemap submitted the non-trailing-slash version. This is a conflict.

This is especially prevalent in WordPress sites where the permalink structure uses trailing slashes by default but sitemap plugins don't consistently apply the same normalization. It's also common in Next.js sites where trailingSlash: true in next.config.js causes redirects but the sitemap.ts doesn't mirror the setting.

Root Cause 2: HTTP vs. HTTPS Mismatch

Even in 2026, we regularly see sitemaps serving HTTP URLs while the canonical tags on those pages use HTTPS. This happens most often after migrations where the sitemap generator was not updated alongside the SSL installation.

The symptom: your sitemap contains http://yourdomain.com/page, but visiting that URL redirects to https://yourdomain.com/page, which has a canonical of https://yourdomain.com/page. The sitemap URL is not the canonical. In GSC, these appear as submitted URLs with 0 indexed pages.

The fix is straightforward: ensure your sitemap generator always produces HTTPS URLs. In WordPress, this means setting your WordPress URL to HTTPS in Settings > General and confirming the sitemap plugin reflects the change. Always validate by fetching your sitemap and checking that every <loc> entry starts with https://.

Root Cause 3: www vs. Non-www Domain Variation

Similar to the HTTP/HTTPS issue. If your canonical domain is https://www.yourdomain.com but your sitemap generates https://yourdomain.com/page (no www), your sitemap URLs are non-canonical.

This conflict is usually introduced during site setup when the server redirects to www (or non-www) but the environment variable or base URL used by the sitemap generator wasn't updated to match. The canonical tags on pages reference the correct preferred domain; the sitemap references a domain that redirects to the correct one.

Verify this by fetching your sitemap and checking the exact domain in every <loc> tag against the rel=canonical value on those pages. They must be identical, character for character.

Root Cause 4: Tracking Parameters in Sitemap URLs

Some CMS or e-commerce platforms attach tracking or session parameters to URLs when they generate sitemaps from internal link lists or database exports. A URL like https://yourdomain.com/product?source=newsletter&session=abc123 in your sitemap points to a parameterized URL, but your canonical tag on that page points to https://yourdomain.com/product.

Google will crawl the parameterized URL, read the canonical, and not index the parameterized version — correctly. But you've wasted a sitemap entry and a crawl on a URL that was never going to be indexed. At scale, this is a meaningful crawl budget drain.

This also shows up with internal search result URLs, sorted/filtered product listing pages, and paginated URLs where only page 1 has a self-referencing canonical. Check your sitemap for ? characters — any parameterized URL needs scrutiny.

Root Cause 5: Faceted Navigation URLs

Faceted navigation — filter and sort parameters on category pages — is one of the most prolific generators of non-canonical sitemap URLs. An e-commerce site with 50 product categories, each supporting 10 filter dimensions with 5 options each, can produce tens of thousands of faceted URL combinations. Most of these are canonicalized back to the base category URL.

If your sitemap generator crawls your live site to discover URLs (rather than reading from a content database), it will pick up these faceted URLs from links in the navigation or from XML sitemaps generated by plugins that don't filter them. The result: a sitemap full of URLs like https://yourdomain.com/shoes?color=red&size=10&sort=price-asc that all canonicalize to https://yourdomain.com/shoes.

The correct approach is to generate sitemaps from your content database, not from a site crawl. If you know which category pages are canonical, list only those. Any URL generated by applying filters on top of a category page should not appear in your sitemap unless you've deliberately chosen to index that facet.

Root Cause 6: Printer-Friendly and Alternate Format Pages

Older CMS platforms and some publishing tools generate alternate versions of content pages: printer-friendly URLs (/post/123/print), AMP versions (/amp/post/123), PDF exports, or mobile-specific pages. These alternate versions typically canonicalize back to the main page.

If your sitemap plugin or generator doesn't explicitly exclude these URL patterns, they end up in your sitemap as non-canonical URLs. Worse, AMP URLs are particularly confusing because they used to require a separate AMP sitemap — advice that is now outdated and actively counterproductive.

Audit your sitemap for URL patterns containing /print, /amp, /mobile, .pdf, or any other indicator of an alternate format. These should be excluded from your main sitemap unless they are themselves the canonical version of that content.

The Systematic Fix Process

Fixing non-canonical URLs in a sitemap requires three things: an audit to find the conflicts, an analysis to categorize them, and a fix applied either at the sitemap generator or at the canonical tag level.

Step 1: Audit Sitemap URLs Against Canonical Tags

Download your sitemap and extract all <loc> values into a list. Then crawl those specific URLs (using Screaming Frog in List mode, or a custom script) and extract the rel=canonical value from each page's HTML head.

Compare the two columns: sitemap URL vs. canonical URL. Any row where they differ is a non-canonical sitemap entry. Export this mismatch report — it will tell you both the scope of the problem and the pattern causing it.

Step 2: Identify the Pattern

Most sites have one or two dominant patterns in their mismatch report. Sort the differences and look for common causes:

  • Trailing slash: all mismatches are exactly one character different (the trailing slash). Fix at the sitemap generator level to match the canonical format.
  • HTTP vs HTTPS: all sitemap URLs start with http:// and all canonicals start with https://. Fix the base URL in your sitemap generator.
  • Parameter URLs: sitemap URLs have query strings, canonicals don't. Fix the sitemap generator to exclude parameterized URLs, or fix the filter at the URL discovery stage.
  • Random mismatches with no pattern: canonical tags on individual pages are wrong. Fix at the page level — the canonical tags need to be corrected to match the actual preferred URL.

Step 3: Fix at Source vs. Fix in the Sitemap Generator

There are two places to fix a sitemap/canonical mismatch:

Fix the sitemap generator when the canonical tags are correct and the sitemap is generating wrong URLs. This is the right approach for trailing slash issues, HTTP/HTTPS mismatches, and parameter contamination — the canonical is the source of truth, so bring the sitemap into alignment with it.

Fix the canonical tags when the sitemap is correct and the canonical tags are wrong — for example, after a migration where canonical tags still point to the old domain, or when canonical tags were incorrectly set to a different URL during a CMS migration.

Never "fix" both simultaneously by changing them to meet in the middle. Pick the authoritative source — usually the canonical tag — and bring everything else into alignment with it.

After the Fix: Verification

After updating your sitemap generator or canonical tags, re-run the audit:

  1. Fetch the updated sitemap and extract all URLs.
  2. Re-crawl those URLs with Screaming Frog to verify canonical tags now match sitemap URLs exactly.
  3. Resubmit the sitemap in Google Search Console. This triggers a fresh fetch and resets the submitted URL count.
  4. Monitor the GSC Pages report over the following 2–4 weeks. The "Alternate page with proper canonical tag" count should decrease as Google recrawls the affected pages.

For large sites with thousands of mismatches, prioritize fixing the most frequently crawled pages first — those with the most internal links and highest PageRank. Fixing 20% of the mismatches that cover 80% of your crawl traffic will produce noticeable results before the tail pages are resolved.

Find non-canonical URLs in your sitemap automatically
Free — compares sitemap URLs against canonical tags at scale
Analyze My Site Free

Related Guides

Is your sitemap hurting your Google rankings?
Check for free →