HTTP 4xx/5xx URLs in Your Sitemap
Sitemaps are a trust signal: they tell Google which pages you want indexed. When those URLs return 404 Not Found, 410 Gone, 500 Internal Server Error, or 503 Service Unavailable, you're telling Google to crawl broken pages - and you're damaging your site's quality signals in the process.
What is this error?
Any URL in your sitemap.xml that does not return a 200 OK status when fetched is a broken entry. The four status classes you'll commonly see are: 404 (Not Found), 410 (Gone), 500 (Internal Server Error), and 503 (Service Unavailable). Google Search Console surfaces these under "Submitted URL not found (404)" and "Submitted URL has crawl issue" in the Pages report.
Why does it happen?
404s typically creep in when products are discontinued, blog posts are deleted, or URL slugs change without the sitemap generator being updated. 500s often point to slow database queries or memory limits on specific pages. 503s appear during deployments if your sitemap was generated before the deploy but URLs return 503 during cache warmup. Dynamic sitemaps pulling from stale caches are a major source of all three.
Why does it hurt SEO?
John Mueller has explicitly said that sitemaps with many 404s are treated as low-quality and can reduce how often Google processes the sitemap. For 5xx errors the impact is worse: Googlebot interprets them as server stress and throttles crawl rate across the entire domain. Over time, a dirty sitemap correlates with fewer pages getting discovered and slower indexing of new content.
How to detect it
Sitemap Fixer fetches every URL in your sitemap and reports the HTTP status code. You'll get a line-by-line breakdown: which URLs returned 404, 500, or 503, grouped by status. You can also check Search Console → Indexing → Pages for "Submitted URL not found (404)" and "Submitted URL has crawl issue" reports.
How to fix it
1. Export the list of non-200 URLs from Sitemap Fixer or Screaming Frog. 2. For deleted pages: remove from the sitemap AND return 410 Gone on the URL itself. 3. For moved pages: replace the old URL in the sitemap with the new destination (never list the 301 source). 4. For 500/503 errors: debug the underlying server issue before resubmitting - don't just remove the URL. 5. Regenerate the sitemap from a fresh database query, not a cached file. 6. Resubmit the clean sitemap in Search Console and monitor the Pages report for one week.
Real-world example
A SaaS company's sitemap listed 1,800 URLs, of which 640 returned 404 after they migrated from WordPress to a headless CMS. Google stopped recrawling the sitemap for 9 days. After cleaning the list to 1,160 live URLs, processing resumed and 73 new pages were indexed within 2 weeks.
Common mistakes
- Listing 301-redirecting URLs instead of their final destination
- Including soft-404 pages (200 status but thin/empty content)
- Treating intermittent 503s as "temporary" and leaving them in the sitemap