Fix Your Sitemap for Magento
Magento 2 (now Adobe Commerce) generates sitemaps via the Catalog XML Sitemap feature, but layered navigation, configurable product children, and multi-store views routinely balloon the sitemap into millions of low-value URLs.
Magento sites get complicated fast. You have store views (one per language/region), configurable products with dozens of simple children, CMS pages, category trees, and layered navigation that loves to emit every filter combination as a crawlable URL. The sitemap config lives across several admin panels, and each needs to be set correctly before the generator produces a clean feed.
Worked on a Magento 2 apparel store with 18,000 configurable products and 140,000 simple children. The default sitemap had 158,000 URLs - every child simple product was listed. After enabling product canonicals and filtering the sitemap to parents only, it dropped to 18,400 URLs. GSC's "crawled, not indexed" count dropped 70% over eight weeks.
Common Magento Sitemap Issues
- Layered navigation URLs (
?price=,?color=) inflating category sitemaps - Configurable product children (simple products) indexed separately from the parent
- Multi-store views generating duplicate product URLs without hreflang
- Out-of-stock or disabled products staying in sitemap after they're hidden from the storefront
- Category grid/list view duplicates (
?product_list_mode=list) as separate URLs - Session IDs (
?SID=) or tracking params (utm_source) on legacy deployments - Sitemap not regenerating because cron isn't running, or running as wrong user
- One giant sitemap file instead of an index for 500k+ product catalogs
Sitemap config panels to touch
- Stores > Configuration > Catalog > XML Sitemap - frequency, priority, entity inclusion
- Stores > Configuration > Catalog > Catalog > Search Engine Optimization - product canonicals, category canonicals, "Use Categories Path for Product URLs"
- Marketing > SEO & Search > Site Map - create a sitemap record per store view, set file path
- Stores > Configuration > General > Web > URL Options - add query params to exclusion list
Robots.txt for layered navigation
User-agent: * Disallow: /*? Disallow: /catalogsearch/ Disallow: /customer/ Disallow: /checkout/ Disallow: /review/ Disallow: /sendfriend/ Disallow: /wishlist/ Allow: /*?p= # keep pagination crawlable (optional) Sitemap: https://store.com/sitemap.xml
Disallow: /*? is aggressive but right for most Magento stores - layered nav parameters create so many duplicates that blanket-blocking is cleaner than cherry-picking. Allow back the params you actually want crawled (pagination).
CLI generation and cron
# Generate now bin/magento sitemap:generate # Verify cron is running (sitemap jobs run via generating_sitemap_xml) bin/magento cron:run --group=default tail -f var/log/cron.log # Schedule in cron.groups.xml or via Marketing > SEO & Search > Site Map # Frequency set per sitemap record
Multi-store hreflang
Generate one sitemap record per store view (Marketing > SEO & Search > Site Map). Each writes to its own file, e.g., sitemap_en.xml, sitemap_fr.xml. Reference them from a root sitemap_index.xml you write manually. For hreflang, enable the built-in "Add Store Code to URLs" feature and the Magento_Sitemap module's hreflang support (or use a third-party module like Mageplaza Hreflang Tags for richer output). Submit each store's sitemap to a matching GSC property.
Step-by-Step Fix Guide
- In Stores > Configuration > Catalog > XML Sitemap, set per-entity priority and frequency; generate per store view
- Enable Canonical Meta Tag for Products and Categories so children canonicalize to parents
- Disable "Use Categories Path for Product URLs" unless you explicitly want nested URLs
- Add robots.txt rules for layered nav, search, checkout, customer, wishlist
- Create one sitemap record per store view in Marketing > SEO & Search > Site Map
- Confirm cron is running (
bin/magento cron:run) and sitemap jobs complete - Verify with
curl https://store.com/sitemap.xml- should return 200 and match expected URL count - Submit each store-view sitemap to its matching GSC property