Sitemap Size Limit: 50,000 URLs and 50MB Explained
The Sitemap Size Limits
Google's sitemap protocol specifies two hard limits for each sitemap file: a maximum of 50,000 URLs and a maximum file size of 50MB (52,428,800 bytes) uncompressed. If your sitemap exceeds either limit, Google may not process all URLs in the file — it will stop at whichever limit is hit first.
In practice, the 50,000 URL limit is almost always the binding constraint. A typical URL entry including <loc>, <lastmod>, and <changefreq> is around 150-200 bytes. At that size, a 50MB file accommodates 250,000 to 330,000 URLs — far more than 50,000. The file size limit only becomes relevant if your URLs are very long (query-string-heavy URLs) or if you include extensive optional attributes.
Most sites with fewer than 50,000 indexable pages will never need to worry about these limits. However, e-commerce sites with large product catalogs, news sites with many articles, and large SaaS platforms frequently exceed the URL limit and need to implement sitemap index files.
Sitemap Index Files
When your site exceeds the 50,000 URL per-file limit, create a sitemap index file — an XML file that lists multiple child sitemap files rather than listing URLs directly. The sitemap index uses the <sitemapindex> element containing multiple <sitemap> elements, each with a <loc> pointing to a child sitemap file and an optional <lastmod> date.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://yoursite.com/sitemap-blog.xml</loc>
<lastmod>2026-04-27</lastmod>
</sitemap>
<sitemap>
<loc>https://yoursite.com/sitemap-products.xml</loc>
<lastmod>2026-04-27</lastmod>
</sitemap>
<sitemap>
<loc>https://yoursite.com/sitemap-pages.xml</loc>
<lastmod>2026-04-27</lastmod>
</sitemap>
</sitemapindex>Submit the sitemap index URL to Google Search Console — not the individual child sitemaps. Google follows the index to discover and process all child sitemaps automatically. There is no official size limit on the index file itself, though keeping the list manageable is good practice.
How to Split Your Sitemap
The best way to split a large sitemap is by content type. This makes it easy to monitor indexing rates by segment in Google Search Console — if your product pages are indexing at 60% but your blog posts are at 90%, you immediately know where the problem is concentrated.
Recommended split structure for large sites:
sitemap-pages.xml— static pages (homepage, about, contact, landing pages)sitemap-blog.xml— blog posts and articlessitemap-products.xml— product pagessitemap-categories.xml— category and collection pagessitemap-images.xml— image sitemaps (if applicable)
Most major CMS platforms handle this automatically. WordPress with Yoast SEO generates a sitemap index by content type out of the box. Shopify generates separate sitemaps for products, collections, pages, and blogs. In custom applications and Next.js projects, you may need to implement sitemap splitting manually.
Gzip Compression for Large Sitemaps
You can compress sitemap files using gzip to reduce file transfer size. Use the .xml.gz extension for compressed sitemaps — for example, sitemap.xml.gz. Googlebot supports gzip-compressed sitemaps natively. The 50MB limit refers to the uncompressed size, so compression does not raise this ceiling, but it does significantly reduce bandwidth and load time for large sitemap files.
Most web servers can serve gzip-compressed sitemaps automatically. In Apache, ensure mod_deflate is enabled. In nginx, use the gzip directive. If you pre-compress the file and store it as .xml.gz, your server should serve it with the Content-Encoding: gzip header so Googlebot knows to decompress it.
For reference: a well-structured 50,000-URL sitemap typically compresses from around 8-10MB down to 1-2MB, making a meaningful difference in server response times when Googlebot fetches it.
Monitoring Large Sitemaps in Search Console
After submitting a sitemap index with multiple child sitemaps, Google Search Console shows statistics for each child sitemap separately in the Sitemaps report. For each child file, you can see: how many URLs were submitted, how many Google has discovered, and whether there are any fetch errors.
Monitor these metrics regularly:
- Submitted vs. indexed ratio — if you submitted 10,000 product URLs but only 6,000 are indexed, investigate the 4,000 gap using the Pages report under Indexing.
- Fetch errors — a child sitemap returning errors means Google cannot read it, and all its URLs are effectively invisible to Google until the error is fixed.
- Indexing rate by segment — compare indexing rates across child sitemaps. A significantly lower rate for one content type signals a quality or crawlability issue with that segment.
What URLs to Include (and Exclude) from Your Sitemap
As your sitemap grows toward the 50,000 URL limit, be disciplined about what you include. The sitemap should only contain URLs that:
- Return a 200 HTTP status code
- Are the canonical version of the page (not a parameter variant or redirect)
- Are indexable (no noindex tag)
- Contain unique, valuable content worth indexing
Exclude from your sitemap: paginated pages beyond page 2 (unless they have unique content), tag and archive pages with thin content, URL parameter variants, pages blocked by robots.txt, and pages with noindex tags. A smaller, cleaner sitemap of high-quality pages will generally result in better indexing rates than a large sitemap that includes everything.
Keeping Your Sitemap Up to Date
A sitemap with stale or incorrect URLs is worse than no sitemap at all. Every 404 in your sitemap wastes crawl budget and signals poor site maintenance to Google. Every non-canonical URL wastes crawl budget and creates conflicting signals. Set up automated sitemap generation so that new pages are added and deleted pages are removed without manual intervention.
Update the <lastmod> date for a URL when its content meaningfully changes — not just when you regenerate the sitemap. Google may use the lastmod signal to prioritize recrawling recently updated content. However, if you set lastmod dates inaccurately (for example, updating all dates every time the sitemap regenerates), Google will learn to ignore the signal entirely.