Fix Your Sitemap for Shopify
Shopify generates /sitemap.xml automatically as a sitemap index referencing products, collections, pages, and blogs. The default is mostly fine - except when it isn't. Product variants, unpublished-but-linked items, and the myshopify.com preview domain all produce subtle indexing problems.
Shopify's sitemap is the most restrictive of any ecom platform: you can't edit the file, can't add fields, can't mark individual products for exclusion. What you do have is product visibility, collection settings, and robots.txt.liquid - and those three levers cover 95% of real problems.
Audited a Shopify Plus jewelry store last quarter. 4,200 active products, sitemap_products_1.xml showing 4,200 (good), but GSC reported 7,800 URLs under "Crawled - currently not indexed". The culprit: product variants at ?variant=123456 were linked internally from the product gallery, crawled, and self-canonicalized to the parent - but Google still counted them as crawl budget. A robots.txt rule fixed it.
Shopify sitemap structure
https://yourstore.com/sitemap.xml # index |- sitemap_products_1.xml # first 5,000 products |- sitemap_products_2.xml # next 5,000 |- sitemap_collections_1.xml |- sitemap_pages_1.xml |- sitemap_blogs_1.xml
Each sub-sitemap caps at 5,000 URLs. Shopify paginates automatically past that. You submit the index URL to GSC; it handles the rest.
Common Shopify Sitemap Issues
- Product variant URLs (
?variant=) crawled but not in sitemap - crawl budget waste - Collection filter URLs (
?pf_st_color=red) crawled and producing duplicate content - Archived products still accessible via old links, returning 404 when crawled
- The
*.myshopify.compreview URL indexed alongside the custom domain - Draft blog posts hitting the sitemap on Shopify 2.0 themes if the blog is set to public before content is ready
- Alternate language markets each emitting their own sitemap without hreflang coordination
- Pages excluded from navigation but still indexable (e.g.,
/policies/privacy-policyclones)
robots.txt.liquid for filter and variant URLs
Shopify lets you edit robots.txt.liquid in the theme code. This is the only way to block crawling of specific URL patterns. Example:
{% for group in robots.default_groups %}
{{- group.user_agent }}
{%- for rule in group.rules -%}
{{ rule }}
{% endfor -%}
{# Block variant and filter URLs #}
Disallow: /*?variant=
Disallow: /*?pf_st_
Disallow: /*?pf_pt_
Disallow: /*?_pos=
Disallow: /*?_sid=
Disallow: /*?_ss=
{%- if group.sitemap != blank %}
Sitemap: {{ group.sitemap }}
{%- endif %}
{% endfor %}Edit at Online Store > Themes > Code > Templates > robots.txt.liquid (you may need to create it). Changes apply immediately.
Unpublishing and the myshopify.com domain
To remove a product from the sitemap, set its Online Store sales channel to inactive or archive the product entirely. Both take effect on the next cache cycle (usually within an hour). For the *.myshopify.com preview domain, Shopify automatically adds X-Robots-Tag: noindex - you don't need to do anything, but verify with curl -I https://yourstore.myshopify.com/ and look for the header. If it's missing, open a Shopify support ticket.
Markets and hreflang
Shopify Markets gives you separate URLs per region (yourstore.com/en-gb/, yourstore.com/en-au/). Each market gets its own sitemap index, and Shopify emits hreflang alternates in the HTML head automatically. What it does not do: emit xhtml:link alternates inside the sitemap XML. For large multi-market stores, consider a supplemental sitemap with explicit alternates submitted separately in GSC.
Step-by-Step Fix Guide
- Submit
yourstore.com/sitemap.xmlto GSC (not.myshopify.com) - Edit
robots.txt.liquidto block variant and filter parameter URLs - Archive or unpublish products you don't want indexed
- Review blog posts set to "Visible" - anything not ready should be a draft
- For Markets, confirm hreflang is emitted in HTML head via
view-source:inspection - Check that
*.myshopify.comreturnsX-Robots-Tag: noindex - Verify the sitemap with
curl https://yourstore.com/sitemap.xml - Monitor GSC coverage per sub-sitemap