Fix Your Sitemap for Webflow

Updated April 2026·By SitemapFixer Team

Webflow auto-generates a sitemap, but CMS collections, paginated collection lists, and staging subdomains routinely cause issues that hurt indexing and crawl efficiency.

Analyze your Webflow sitemap nowTry Sitemap Fixer Free

Webflow's sitemap is auto-generated and you get one of two modes: auto (everything public) or custom (paste your own XML and maintain it manually). There's no middle ground. No per-item sitemap toggle. No filter patterns. You either accept what Webflow produces or you hand-write the file.

Fixed a Webflow agency site with 210 pages published but 680 URLs in the auto sitemap. The extras were paginated CMS Collection List URLs (?page=2 through ?page=14) that Webflow had emitted as separate URLs, plus four Collection template pages the team had duplicated and forgotten. Switching to a custom sitemap with just canonical URLs fixed coverage in GSC within three weeks.

Common Webflow Sitemap Issues

The webflow.io staging leak

Free-plan Webflow sites expose the *.webflow.io staging subdomain without noindex. Paid plans add the noindex header automatically. Either way, you should connect a custom domain as soon as possible. Once the custom domain is primary, Webflow serves the sitemap from the custom domain and drops .webflow.io from search results over a few weeks.

Robots.txt workaround

Webflow lets you edit robots.txt at Project Settings > SEO > robots.txt. A basic setup that blocks the staging subdomain only works if Google respects the User-agent split - which it does not always, so combine it with the noindex header:

# Custom domain robots.txt
User-agent: *
Disallow: /401
Disallow: /404
Disallow: /style-guide
Disallow: /detail_*
Disallow: /*?page=

Sitemap: https://yourdomain.com/sitemap.xml

CMS Collections and pagination

Webflow paginates Collection Lists at 100 items by default. The pagination URLs (?page=2, ?page=3) get crawled but shouldn't be indexed - they're essentially duplicate list pages with different contents. Block *?page= in robots.txt (above) and set rel=next/prev via Custom Code if you want to help Google understand the sequence. For CMS item detail pages, use the per-item "Exclude from sitemap" toggle only on items that really shouldn't be indexed (drafts, internal resources).

Step-by-Step Fix Guide

  1. Connect a custom domain and set it as primary in Project Settings > Hosting
  2. In Project Settings > SEO, enable Auto-generate sitemap.xml and verify base URL
  3. Add robots.txt rules blocking ?page=, /401, /404, /style-guide
  4. Mark utility pages as "Exclude from sitemap" in Page Settings > SEO
  5. Per CMS item: toggle "Exclude from sitemap" for anything not ready
  6. Publish the site (Webflow only regenerates sitemap.xml on publish)
  7. Verify with curl https://yourdomain.com/sitemap.xml - spot-check URL count
  8. Confirm staging returns noindex: curl -I https://yoursite.webflow.io
  9. Submit the custom-domain sitemap to Google Search Console

Frequently Asked Questions

How do I stop webflow.io staging from being indexed?
Webflow adds noindex to the staging subdomain automatically on paid plans. Verify with curl -I https://yoursite.webflow.io and look for X-Robots-Tag: noindex. On free plans, staging is indexable - which is one reason to upgrade or move to a custom domain fast.
Why aren't my CMS Collection items in the sitemap?
Three possibilities: items are set to Draft, the collection template has 'Exclude from sitemap' enabled, or you haven't published the site since adding them. Webflow regenerates sitemap.xml only on publish.
Can I edit Webflow's sitemap.xml directly?
Partially. Disable auto-generation in Project Settings > SEO > Sitemap and paste your own XML into the custom sitemap field. You'll need to maintain it manually, which only makes sense for very small or specific edge-case sites.
Analyze your Webflow sitemap
Find all issues in your sitemap - free, no credit card needed
Analyze My Sitemap Free
Other platform guides