Sitemap Blocked by Robots.txt: How to Find and Fix It
A robots.txt block on a page listed in your sitemap is one of the most confusing errors in Google Search Console — your sitemap is actively telling Google to visit a page while robots.txt is simultaneously forbidding the crawl. Google cannot index what it cannot read, so these pages produce zero organic visibility regardless of their content quality. This guide explains exactly how to find the conflicting rules, which fix to apply, and how to prevent the issue from recurring after site changes.
What the Error Means
When Google Search Console reports 'Blocked by robots.txt' for URLs in your sitemap, it means your sitemap is telling Google to index a page that your robots.txt is simultaneously telling it not to crawl. This is a direct contradiction. Google cannot index a page it cannot read - so these URLs will not appear in search results regardless of their content quality. The error is common and easy to fix once you know where to look.
How to Find the Conflict
Step 1: Go to Google Search Console, then Indexing, then Pages, and filter by 'Blocked by robots.txt'. Note the exact URLs affected. Step 2: Open yoursite.com/robots.txt in your browser. Look for Disallow rules that match the affected URLs. Common culprits: Disallow: / (blocks everything), Disallow: /wp-admin/ blocking pages that were accidentally put in that directory, or wildcard rules like Disallow: /*?* that accidentally block pages with query parameters you put in your sitemap. Step 3: Test any suspicious URL using Google Search Console's URL Inspection tool - it shows whether the URL is blocked by robots.txt.
The Two Ways to Fix It
Option A - Remove from sitemap: If the blocked pages should not be indexed (admin pages, checkout, login), remove them from your sitemap. Your sitemap should only contain pages you actually want Google to index. This is the right fix when the robots.txt block is intentional. Option B - Update robots.txt: If the blocked pages should be indexed, remove or narrow the robots.txt rule that is blocking them. This is the right fix when you accidentally blocked pages you need indexed - for example, if a blanket Disallow rule is catching pages you want ranked.
The Development Mode Trap
The most catastrophic version of this error: your site was in development with Disallow: / in robots.txt to prevent indexing during development, and nobody removed it after launch. Your entire site is blocked. Google discovers it through your sitemap but cannot crawl anything. Check your robots.txt immediately after any site launch. This single oversight can mean zero organic traffic for months before it is discovered.
How to Test Your Fix
After updating robots.txt or your sitemap, use Google Search Console URL Inspection on an affected URL and click Test Live URL. If it still shows as blocked, wait 24 hours for Google to recrawl your robots.txt (Google caches robots.txt for up to 24 hours). After that, resubmit your sitemap in Search Console under Sitemaps. Google will reprocess the submitted URLs against the updated robots.txt and the errors should clear within a few days.
Preventing Future Conflicts
The best prevention: never include a URL in your sitemap if it is blocked in robots.txt. Run a regular audit - SitemapFixer checks every URL in your sitemap against your robots.txt automatically and flags conflicts immediately. Before launching any new site or after any major robots.txt change, use the robots.txt Tester in Google Search Console to verify your rules allow the pages you need indexed.
Wildcard Rules That Accidentally Block Too Much
Robots.txt wildcard rules using * are a common source of unintended blocks. For example, Disallow: /*?* intended to block faceted navigation URLs can also block legitimate pages that happen to use query parameters. Disallow: /products/ intended to block a staging directory can block your entire product catalog if the live site uses the same path. Always test wildcard rules against your actual URLs using Google Search Console's robots.txt Tester before deploying. Check at least 10-20 real URLs from each section of your site to confirm the rule does not match anything you need indexed.
When the Sitemap Itself Is Blocked
A less obvious variant: your sitemap URL itself is blocked by robots.txt, preventing Google from reading it at all. This typically happens with rules like Disallow: /*.xml or Disallow: /sitemap* that were added to block certain file types or patterns. Check by testing your sitemap URL in Google Search Console's URL Inspection tool — if it shows 'Blocked by robots.txt' with your sitemap URL as the target, that's the problem. Fix by adding Allow: /sitemap.xml above the blocking Disallow rule, since Allow takes precedence when rules conflict.
User-Agent Specific Blocks Causing Partial Issues
Your robots.txt may have a Googlebot-specific block that does not affect your browser tests. If you test blocked URLs using your own browser or a generic crawler, they load fine — but Googlebot is specifically blocked. This pattern appears when developers add User-agent: Googlebot with Disallow: / to hide a page from Google specifically while keeping it visible to users. Check your robots.txt for any User-agent: Googlebot blocks and ensure they are intentional. Legitimate uses include blocking low-value sections from Googlebot's crawl while allowing user access.
How Google Handles the Conflict Between Sitemap and Robots.txt
When Google encounters a URL in your sitemap that is blocked in robots.txt, it cannot crawl the page — so it cannot index it regardless of what the sitemap says. Google will report the URL in Search Console under 'Blocked by robots.txt' in the Pages report. The URL will accumulate no crawl data, no index status, and no ranking signals as long as the block persists. Importantly, removing the URL from your sitemap does not automatically make Search Console errors disappear — you need to fix the underlying robots.txt conflict and then resubmit your sitemap to clear the errors.