URLs Blocked by Robots.txt in Your Sitemap

Updated April 2026·By SitemapFixer Team

A sitemap says "please crawl these URLs" while robots.txt says "do not crawl these URLs." When the same URL appears in both, you're sending Google contradictory instructions - and Google resolves the conflict by honoring the disallow. The URL never gets indexed, and your sitemap loses credibility as a discovery signal.

Cross-check your sitemap against robots.txt

We match every sitemap URL against your active Disallow rules

Analyze My Sitemap

What is this error?

This error occurs when a URL is listed inside your sitemap.xml but a matching Disallow rule exists in your robots.txt file. In Google Search Console this appears under "Indexed, though blocked by robots.txt" or "Blocked by robots.txt" in the Pages report. Bing Webmaster Tools surfaces it as "Allow/Disallow conflict."

Why does it happen?

The most frequent causes are overly broad Disallow rules (like Disallow: /api/ accidentally blocking /api-documentation/), staging rules that weren't removed in production, and auto-generated sitemaps that crawl the whole site without respecting robots.txt. E-commerce sites often see this when filter URLs are disallowed but sitemap generators index them anyway.

Why does it hurt SEO?

Blocked URLs in a sitemap are effectively dead entries - they consume space, waste the sitemap's "attention budget," and contribute nothing. Worse, Google may interpret a high ratio of blocked-to-allowed URLs as a sign that your sitemap is unreliable and reduce how seriously it treats every URL in the file, including the healthy ones.

How to detect it

Use Google Search Console's robots.txt Tester (Legacy Tools) to paste specific URLs and see which rule blocks them. Sitemap Fixer automates this by parsing your robots.txt, compiling the Disallow rules, and flagging every sitemap URL that matches one - including wildcard and path-prefix matches.

How to fix it

1. Decide the true intent for each blocked URL: should it be indexed or not? 2. For URLs that SHOULD be indexed: remove the Disallow rule or add a specific Allow line. 3. For URLs that SHOULD NOT be indexed: remove them from the sitemap entirely. 4. Test the updated robots.txt with Google's robots.txt Tester before deploying. 5. Regenerate the sitemap and resubmit it in Search Console. 6. Monitor the Pages report for 7-14 days to confirm "Blocked by robots.txt" counts drop.

Real-world example

A marketplace had Disallow: /search in robots.txt to block internal search pages, but their sitemap generator included every /search?q= URL it could find - 14,000 of them. After removing those URLs from the sitemap, Google's "Blocked by robots.txt" count dropped from 14,200 to 240 in 10 days, and crawl budget for product pages increased measurably.

Common mistakes

Forgetting that robots.txt rules are case-sensitive on the path portion
Using Disallow to "remove a page from Google" (use noindex instead - Disallow blocks crawling, not indexing)
Leaving staging Disallow: / rules in production robots.txt after launch

Frequently Asked Questions

Will Google ignore my robots.txt if a URL is in the sitemap?

No. Robots.txt always wins. If a URL is disallowed, Google will not crawl it, even if your sitemap lists it. The sitemap entry just gets flagged as a contradictory signal.

Can blocked URLs still appear in Google search results?

Yes, but usually with no description (the URL appears with a note saying 'No information available'). This happens because Google knows the URL exists from the sitemap but cannot fetch it to read the content.

How do I unblock a URL without removing the disallow rule?

Add a more specific Allow rule in robots.txt that overrides the broader Disallow. For example: Disallow: /admin/ followed by Allow: /admin/public-faq. Google honors the longest matching path.

Fix this in your sitemap now

Enter your domain and get a full sitemap audit in 60 seconds

Analyze My Sitemap Free

Related sitemap errors

All sitemap errors