By SitemapFixer Team
Updated April 2026

XML Sitemap Best Practices for SEO (2026)

Analyze your sitemap for freeTry Sitemap Fixer

Following XML sitemap best practices is one of the highest-leverage technical SEO actions you can take. A well-structured sitemap signals quality to search engines, ensures your most important content gets crawled efficiently, and helps you catch indexing problems before they affect rankings. This guide covers every best practice that matters in 2026, from what to include and exclude to how to monitor and validate your sitemap over time.

Include Only Your Best Pages

Your sitemap is not an inventory of every URL on your site — it is a curated list of the pages you want Google to discover and index. Including low-quality pages dilutes the signal your sitemap sends and can lead Google to crawl junk instead of your best content. Exclude pages with noindex meta tags, thin or duplicate content, faceted navigation URLs that generate thousands of near-identical variations, pagination pages beyond the first, and utility pages like search results, login screens, or thank-you pages. A good rule of thumb: if you would be embarrassed to show a URL to a potential customer, it does not belong in your sitemap. Quality over quantity is the governing principle — a sitemap with 200 strong pages outperforms one with 2,000 pages that includes low-value content.

Use Only Canonical URLs

Every URL in your sitemap must be the canonical version — the exact URL you want indexed, with no redirects, no tracking parameters, and no alternate-protocol variants. If you list a URL that redirects to another page, you are telling Google to follow a chain instead of land directly on the intended destination, which wastes crawl budget and weakens the indexing signal. Strip all UTM parameters, session IDs, and query string variants before including any URL in your sitemap. If your site is accessible at both www.example.com and example.com, pick one as canonical and use only that form consistently. Mixing HTTP and HTTPS variants in the same sitemap is a common error that confuses crawlers about which version is authoritative — choose one protocol and apply it to every URL.

Keep All URLs Indexable

A URL in your sitemap that is blocked from indexing sends a contradictory signal: you are asking Google to index it while simultaneously telling it not to. Any URL with a noindex meta tag or X-Robots-Tag header must be removed from your sitemap immediately. Similarly, URLs blocked in your robots.txt file should never appear in the sitemap — Googlebot may respect the disallow directive and skip the URL entirely, making the sitemap entry useless, or it may crawl the URL anyway and find the block confusing. Run a crawl of your sitemap URLs periodically to catch any page that has acquired a noindex tag or robots.txt block since it was added. Tools like Screaming Frog and SitemapFixer can flag these mismatches automatically during a site audit.

Use Accurate lastmod Dates

The lastmod attribute tells search engines when a URL was last meaningfully changed, giving them a signal to prioritize recrawling updated content over unchanged pages. Accurate lastmod dates are genuinely useful — if you publish a major update to a page, an updated lastmod date encourages Googlebot to recrawl it sooner. Fake or static lastmod dates — every URL showing today's date regardless of actual change history — train Google to distrust your sitemap signals entirely and eventually ignore the field. The right approach is to generate lastmod automatically from your CMS or database: use the actual date when the page content was last modified, not the date the sitemap was generated. For most CMS platforms (WordPress, Shopify, Next.js with a build step), sitemap plugins and libraries can pull accurate modification timestamps automatically without manual maintenance.

HTTPS Everywhere

Every URL in your sitemap must use HTTPS. Listing HTTP URLs in a sitemap that serves HTTPS content forces Google to follow a redirect on every crawl, wasting crawl budget and weakening the association between your sitemap submission and your canonical HTTPS pages. After migrating to HTTPS or adding a new URL pattern, audit your sitemap to confirm there are no HTTP variants remaining — this error is more common than you would expect, especially on large sites where URL generation logic is spread across multiple templates or plugins. Verify that your HTTPS redirects are correct 301 redirects (not 302), that there are no redirect chains (HTTP to HTTP to HTTPS), and that the destination HTTPS URL is the same as the URL listed in the sitemap. A single HTTP URL in a sitemap of thousands is easy to miss manually but easy to catch with an automated validator.

Split Large Sitemaps Into Sitemap Index Files

The Sitemaps protocol limits each sitemap file to 50,000 URLs and 50 MB uncompressed. Sites that exceed either limit must use a sitemap index file — an XML file that lists multiple individual sitemap files rather than listing URLs directly. Even if your site is below the hard limits, splitting your sitemap by content type (posts, products, categories, landing pages) makes it significantly easier to diagnose indexing problems: if your product pages are not being indexed but your blog posts are, a split sitemap lets you see that in Google Search Console immediately. The sitemap index format uses a sitemapindex root element containing sitemap child elements, each with a loc pointing to an individual sitemap file. Submit the sitemap index URL to Google Search Console and all individual sitemaps will be crawled as part of that submission — you do not need to submit each one separately.

Submit and Monitor in Google Search Console

Submitting your sitemap to Google Search Console is the most direct way to tell Google where to find your pages and to track how many of your submitted URLs are being indexed. Go to Search Console, select your property, navigate to Sitemaps under the Index section, and enter your sitemap URL. After submission, Google Search Console shows a breakdown of discovered URLs versus indexed URLs — a large gap between the two is a signal that some pages have indexing issues worth investigating. Check your sitemap status in Search Console at least monthly, and immediately after any major site change such as a redesign, migration, or large content update. The Last Read date shown in the Sitemaps report tells you when Googlebot last fetched your sitemap — if it has not been read in several weeks, check that your sitemap URL is accessible and returning a 200 status code.

Validate Before Every Major Change

Sitemap errors can silently block pages from being discovered for weeks or months before anyone notices. Before deploying a new sitemap — after a CMS change, a platform migration, or a significant URL restructure — run it through a validator to catch problems before Google sees them. Validators check for malformed XML that will cause the entire sitemap to be rejected, URLs returning 4xx or 5xx status codes, URLs that redirect rather than resolve directly, canonical mismatches between sitemap URLs and page-level canonical tags, and protocol inconsistencies between HTTP and HTTPS. SitemapFixer's free sitemap checker catches all of these issues and surfaces them with clear explanations, so you can fix errors before they affect crawling and indexing. Treating sitemap validation as a pre-deployment step — rather than a reactive measure after noticing ranking drops — is one of the highest-leverage habits in technical SEO.

Check your sitemap now
Free analysis in 60 seconds
Analyze My Sitemap

Related Guides