Fix Your Sitemap for E-commerce Sites
E-commerce sitemap best practices: handling product variants, out-of-stock pages, faceted navigation, and seasonal content without blowing up your crawl budget.
Ecommerce sitemaps fail in a very particular way. You start with 2,000 products and end up with a sitemap claiming 87,000 URLs because every size, color, sort order, and filter got its own entry. Google crawls all of it, wastes the budget, and then stops crawling your actual new arrivals.
Recently audited a mid-size fashion store on a custom Magento build. 14,000 real products, sitemap reporting 412,000 URLs. Every faceted URL (?color=red&size=m&sort=price-asc) was in there. GSC coverage was a mess - 63% "crawled, currently not indexed". The fix took one afternoon and coverage climbed to 89% within six weeks.
Common E-commerce Sitemap Issues
- Faceted navigation URLs (filters, sort orders) bloating the sitemap with near-duplicates
- Every product variant listed instead of the canonical parent
- Out-of-stock or discontinued products left in the sitemap pointing at 404s or soft-404s
- Tag, search, and wishlist URLs included by default
- Missing image sitemap entries for product galleries
- Paginated category pages (?page=2, ?page=3) either all included or all excluded - neither is right
lastmodset to the sitemap generation time instead of the actual product update time- No split by content type - one giant sitemap with products, categories, and CMS pages jumbled together
What most tutorials get wrong
Every generic ecommerce SEO guide says "exclude faceted URLs". That is correct but incomplete. The real question is which facets deserve their own indexable page. Brand and category + brand combinations usually do - users search "nike running shoes" and you want a landing page for it. Color and size filters almost never do.
The lazy fix is noindex, follow on all filter combinations. The better fix is to promote a short list of high-intent facet URLs to real landing pages, canonicalize the rest to the unfiltered category, and only list the promoted ones in the sitemap.
Recommended sitemap split
https://store.com/sitemap.xml # index file https://store.com/sitemap-products.xml # canonical product URLs only https://store.com/sitemap-categories.xml # top-level + promoted facets https://store.com/sitemap-pages.xml # about, shipping, policies https://store.com/sitemap-blog.xml # editorial content https://store.com/sitemap-images.xml # product gallery images
Splitting by type matters because GSC reports coverage per sitemap file. When your product sitemap shows 71% indexed but your category sitemap shows 98%, you know where to dig.
Handling out-of-stock products
Three options, pick one consistently:
- Keep the page live with an "out of stock" message and a restock signup. Leave it in the sitemap. Best for items you will restock.
- Redirect (301) to the closest equivalent product or the parent category. Remove from sitemap. Best for discontinued items with a clear successor.
- 410 Gone for truly discontinued items with no equivalent. Remove from sitemap. Tells Google to drop the URL faster than a 404.
What you should not do: leave a 404 live and keep the URL in your sitemap. That is a coverage error in GSC and kills the sitemap's trust signal.
Image sitemap for product galleries
Product image traffic from Google Images is underrated. Embed image entries inside the product sitemap so each URL lists its gallery:
<url>
<loc>https://store.com/products/running-shoe-x1</loc>
<lastmod>2026-04-10</lastmod>
<image:image>
<image:loc>https://store.com/img/x1-main.jpg</image:loc>
<image:title>Running Shoe X1 - Black</image:title>
</image:image>
<image:image>
<image:loc>https://store.com/img/x1-side.jpg</image:loc>
</image:image>
</url>Large catalogs (50k+ URLs)
Sitemap files cap at 50,000 URLs or 50 MB uncompressed. For large catalogs, chunk by category or by product ID range. Keep each chunk under 40k URLs for headroom, and gzip them. Regenerate only the chunks containing changed products rather than rebuilding the entire sitemap on every publish - that nightly full regen is often what breaks CI pipelines on stores with 200k+ SKUs.
Step-by-Step Fix Guide
- Run a full sitemap audit to list every URL currently included
- Strip filter, sort, and search parameters - anything matching
?color=,?sort=,?q= - Keep only canonical product URLs - drop variant URLs that canonicalize elsewhere
- Decide your out-of-stock policy and apply it consistently
- Split into product, category, page, blog, and image sitemaps via an index file
- Set
lastmodfrom actual product/category update timestamps, not the build time - Exclude tag, search, wishlist, cart, account, and login URLs
- Verify with
curl -I https://store.com/sitemap.xmlthat it returns 200 and not gzip without the right Content-Encoding - Submit each sub-sitemap to Google Search Console and monitor coverage per file