By SitemapFixer Team
Published April 2026 · 10 min read

XML Sitemap Index: When You Need One and How to Structure It Correctly

Validate your sitemap index structure instantlyAnalyze My Site Free

A sitemap index file is not a sitemap — it's a file that points to sitemaps. This distinction matters because misunderstanding what a sitemap index does leads to some of the most persistent indexing problems we see: malformed XML that parsers silently reject, lastmod dates that mislead crawlers, and nested indexes that violate the spec. This guide covers when to use a sitemap index, exactly how to structure one, and the mistakes that break them.

When Do You Actually Need a Sitemap Index?

The XML sitemap specification allows a maximum of 50,000 URLs per sitemap file and a maximum uncompressed file size of 50MB. If your site exceeds either of these limits, you need multiple sitemaps — and a sitemap index to tie them together. For most sites under 10,000 pages, a single sitemap.xml is sufficient.

Beyond the hard limits, there are practical reasons to use a sitemap index even on smaller sites:

  • Separating content types for better reporting. If you submit sitemap-posts.xml, sitemap-products.xml, and sitemap-categories.xml via an index, GSC shows you separate submitted/indexed counts for each. This makes diagnosing problems much faster — you immediately know which content type has an indexing problem.
  • Incremental updates. On very large sites, regenerating a 50,000 URL sitemap from scratch on every content publish is slow and cache-heavy. With an index, you can update only the child sitemap relevant to the new content while leaving others cached.
  • Multiple content paths or subdirectories. A multi-language site might separate sitemap-en.xml and sitemap-es.xml, making it easier to audit coverage per locale.

If none of these scenarios apply to you, a single sitemap.xml is cleaner and has less surface area for bugs.

The Correct Structure of a Sitemap Index File

A sitemap index uses a different XML namespace than a regular sitemap. Here is the minimal valid structure:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://yourdomain.com/sitemap-posts.xml</loc>
    <lastmod>2026-04-22</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yourdomain.com/sitemap-products.xml</loc>
    <lastmod>2026-04-20</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yourdomain.com/sitemap-categories.xml</loc>
    <lastmod>2026-04-15</lastmod>
  </sitemap>
</sitemapindex>

And a corresponding child sitemap (sitemap-posts.xml):

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yourdomain.com/blog/my-first-post</loc>
    <lastmod>2026-04-21</lastmod>
  </url>
  <url>
    <loc>https://yourdomain.com/blog/another-post</loc>
    <lastmod>2026-04-18</lastmod>
  </url>
</urlset>

Note the difference: the index file uses the <sitemapindex> root element, while child sitemaps use <urlset>. Both use the same XML namespace. Both must have the XML declaration on line 1 with the correct encoding.

Common Sitemap Index Mistakes

Wrong or Missing XML Namespace

The namespace http://www.sitemaps.org/schemas/sitemap/0.9 must be declared exactly as written on the root element. A common mistake is using the HTTPS version of the namespace URL:

<!-- WRONG — https namespace does not exist -->
<sitemapindex xmlns="https://www.sitemaps.org/schemas/sitemap/0.9">

<!-- CORRECT — http namespace -->
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

Google is tolerant of this specific error, but validators will reject it, and other crawlers may not. Always use the HTTP namespace as specified in the sitemaps.org protocol.

Using lastmod on the Index to Mean Something It Doesn't

The lastmod on a sitemap index entry refers to when that child sitemap file was last modified — not when the content within it was last modified. This is a subtle but important distinction.

The wrong pattern: setting lastmod on every child sitemap entry in the index to today's date, regardless of whether the content in those sitemaps actually changed. This signals to Googlebot that all child sitemaps changed, causing it to re-fetch all of them on every crawl — wasting crawl budget.

The correct pattern: update lastmod on a child sitemap entry in the index only when new URLs are added to or removed from that child sitemap. If your posts sitemap is unchanged for 3 days, its lastmod in the index should be 3 days old.

Nesting Sitemap Indexes Inside Each Other

The sitemaps.org protocol does not allow sitemap index files to reference other sitemap index files. A sitemap index can only reference regular sitemaps (<urlset> documents). Google will silently ignore any entry in a sitemap index that points to another sitemap index.

<!-- WRONG — do not reference another index from an index -->
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://yourdomain.com/sitemap-index-2.xml</loc>
  </sitemap>
</sitemapindex>

<!-- CORRECT — only reference urlset sitemaps -->
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://yourdomain.com/sitemap-posts-1.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://yourdomain.com/sitemap-posts-2.xml</loc>
  </sitemap>
</sitemapindex>

Child Sitemaps Located Outside the Index Domain

A sitemap index at https://yourdomain.com/sitemap.xml can only reference child sitemaps on the same domain. It cannot reference sitemaps on https://cdn.yourdomain.com/ or any other subdomain or domain. Google will process the cross-domain reference, but it requires the child sitemap's domain to be verified separately in GSC — which makes it confusing to maintain and easy to miss when child sitemaps are missing.

Keep all child sitemaps on the same domain as the index. For CDN-hosted content, generate the sitemap on your main domain and reference the content URLs (which can be on any domain) within the URL entries themselves.

How Google Handles Sitemap Indexes Differently From Single Sitemaps

When Google encounters a sitemap index, it queues each referenced child sitemap for fetching separately. This means:

  • Child sitemaps are not all fetched at once. Google may fetch some immediately and delay others based on crawl budget and server response times.
  • If a child sitemap returns a server error, Google marks only that child as failing — not the entire index. The GSC Sitemaps report shows separate status for each child sitemap when submitted via an index.
  • Google uses the lastmod values on the index entries to decide which child sitemaps to re-fetch. Child sitemaps with old lastmod values may not be re-fetched on every crawl.

This per-child-sitemap granularity is one of the main practical benefits of using an index on large sites — you can isolate indexing problems to specific content types without the noise of a single monolithic sitemap.

How to Structure Sitemaps by Content Type

For a typical content + e-commerce site, a clean split by content type looks like this:

<!-- sitemap.xml (the index) -->
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <!-- Core pages: homepage, about, contact, etc. -->
  <sitemap>
    <loc>https://yourdomain.com/sitemaps/sitemap-pages.xml</loc>
    <lastmod>2026-04-20</lastmod>
  </sitemap>
  <!-- Blog posts paginated 5,000 per sitemap -->
  <sitemap>
    <loc>https://yourdomain.com/sitemaps/sitemap-posts-0.xml</loc>
    <lastmod>2026-04-22</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yourdomain.com/sitemaps/sitemap-posts-1.xml</loc>
    <lastmod>2026-04-19</lastmod>
  </sitemap>
  <!-- Product catalog -->
  <sitemap>
    <loc>https://yourdomain.com/sitemaps/sitemap-products.xml</loc>
    <lastmod>2026-04-21</lastmod>
  </sitemap>
  <!-- Category and tag archive pages -->
  <sitemap>
    <loc>https://yourdomain.com/sitemaps/sitemap-categories.xml</loc>
    <lastmod>2026-04-15</lastmod>
  </sitemap>
  <!-- Image sitemap for media-heavy content -->
  <sitemap>
    <loc>https://yourdomain.com/sitemaps/sitemap-images.xml</loc>
    <lastmod>2026-04-21</lastmod>
  </sitemap>
</sitemapindex>

Note that the child sitemaps live in a /sitemaps/ subdirectory rather than the root. This is optional but keeps your root clean. Google can crawl child sitemaps at any path on the same domain.

Submitting a Sitemap Index to Google Search Console

Submit only the index URL, not the individual child sitemaps. Go to GSC > Indexing > Sitemaps > Add a new sitemap, and enter the full URL of your sitemap index (e.g. https://yourdomain.com/sitemap.xml).

After a successful submission, GSC will expand the index and show each child sitemap as a separate row in the Sitemaps table. Each row shows its own submitted URL count, indexed URL count, and last read timestamp. This per-sitemap reporting is where the organizational benefit of using an index becomes visible.

Also reference your sitemap index in robots.txt:

User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

The robots.txt Sitemap directive should always point to your sitemap index if you have one, not individual child sitemaps. Googlebot uses this directive to discover your sitemap without needing an explicit GSC submission.

Validating Your Sitemap Index

Before submitting, validate both the index file and each child sitemap:

  1. Fetch the index URL directly and verify the XML is well-formed — no encoding errors, no stray characters before the XML declaration, correct root element.
  2. Click through to each child sitemap URL in the index and verify they all return 200 with valid XML.
  3. Count the URLs in each child sitemap and verify none exceed 50,000.
  4. Verify the index itself doesn't exceed 50,000 child sitemap entries (this limit applies to the index as well).
  5. Validate the XML at validator.w3.org or using Google's Rich Results Test for quick schema verification.
Validate your sitemap index structure
Free — checks structure, URLs, and common index mistakes
Analyze My Site Free

Related Guides

Is your sitemap hurting your Google rankings?
Check for free →