Indexed, though blocked by robots.txt (GSC)

Updated April 2026·By SitemapFixer Team

"Indexed, though blocked by robots.txt" is one of the most confusing GSC statuses. It means Google added your URL to its index even though your robots.txt file told Googlebot not to crawl it. This happens because robots.txt controls crawling, not indexing - and Google can index a URL based on external links alone, without ever fetching the page content. The result is a URL in search with no description and no way for Google to see your content.

Find URLs blocked by robots.txt on your site
We surface every Disallow conflict in your sitemap in 60 seconds
Analyze My Sitemap

What this GSC status means

Google discovered your URL through an external signal - usually a backlink from another site, a mention in a sitemap, or an internal link - but your robots.txt file includes a Disallow rule that matches the URL. Googlebot obeyed the rule and did not crawl the page. However, Google decided the URL was significant enough to include in its index anyway. In search results, the listing appears with the URL, no description ("A description for this result is not available because of this site's robots.txt"), and no title beyond what external anchors suggest. This is different from "Blocked by robots.txt" - that status means Google respected the block and did not index the URL.

Why this is bad

Common causes (including WordPress)

How to fix it - three options

Option A: Allow crawling + add noindex (recommended). Remove the Disallow rule so Googlebot can fetch the page, then add <meta name="robots" content="noindex"> or an X-Robots-Tag header. Once Google recrawls and sees the noindex, it drops the URL from the index within 1-4 weeks.

Option B: Remove the external links pointing to the URL. Only realistic if you control the linking sites or the links are internal. Not practical for organic backlinks.

Option C: Leave it and do nothing. Valid if the URL should never appear in search and you don't care about the description-less snippet. For truly sensitive URLs, use password protection or HTTP auth instead - robots.txt was never meant for security.

Step-by-step fix

1. Open GSC > Pages and export the list of URLs with this status. 2. Identify which Disallow rule in robots.txt matches each URL (use GSC's robots.txt Tester). 3. Remove or narrow the Disallow rule in robots.txt so Googlebot can crawl the URL. 4. Add a noindex meta tag to each affected page: <meta name="robots" content="noindex, follow"> Or send an X-Robots-Tag: noindex HTTP header at the server/CDN level. 5. Verify with GSC URL Inspection that the page is now crawlable and noindex is detected. 6. Click "Request Indexing" to speed up the recrawl. 7. Wait 1-4 weeks for Google to drop the URL from its index. 8. Once the URL is deindexed, you can re-add the Disallow rule if you want to save crawl budget. 9. For sensitive URLs, also add HTTP auth or IP allowlisting - robots.txt is not a security control.

The mental model most guides miss

Here's the thing most guides get wrong about this status. Robots.txt blocks CRAWLING, not INDEXING. Those are different operations.

Crawling = fetching the page content. Indexing = adding the URL (with whatever context Google can gather) to the search index. Google can index a URL it has never fetched. It just uses the URL itself, anchor text from external links, and surrounding context on those external pages as the "content."

This is why the status exists. Google saw enough external signals pointing to a URL to decide it's a real thing worth indexing, but your robots.txt said "don't fetch the content." Google honors the fetch rule and indexes it anyway with blank content.

A real case: 400+ /search/ URLs indexed as blanks

A client running a content site had Disallow: /search/ in robots.txt to prevent Google from wasting crawl budget on internal search results. Smart move, right? Except their site had been around for 12 years, and over that time users had posted links to their internal search URLs on Reddit, Stack Exchange, old forums, and other sites.

GSC showed 437 URLs under /search/ as "Indexed, though blocked by robots.txt." Every one appeared in Google's site: search as a blank listing - just the URL, no title, no description. Google had inferred titles from the anchor text in external links, so the listings read things like "how do i reset my password yoursite.com/search?q=reset+password" - which looked spammy and auto-generated.

Fix took 10 minutes. We removed Disallow: /search/. Added <meta name="robots" content="noindex, follow"> to the search results template. Waited 3 weeks. Google recrawled, saw the noindex, dropped all 437 URLs from the index. Then we re-added the Disallow to save crawl budget.

The sequence matters: unblock, add noindex, wait for deindex, re-block. Reversing steps 1 and 4 keeps URLs stuck.

When this status is actually OK to ignore

Not every occurrence needs fixing. Cases where I tell clients to leave it alone:

Fix it if the URLs are discoverable patterns (search results, filters, category archives) that a user might actually click on and be confused by.

Common mistakes when fixing this

How to diagnose which URLs need fixing

# Step 1: Export affected URLs from GSC
#   Indexing > Pages > "Indexed, though blocked by robots.txt"
#   Click the row, then Export

# Step 2: Check each URL's status at scale
while IFS= read -r url; do
  status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
  echo "$status $url"
done < affected_urls.txt

# Step 3: For each URL, identify which Disallow rule matches
# Use GSC > Settings > robots.txt > Open robots.txt tester

# Step 4: Check backlinks pointing to the URL
# Ahrefs/Majestic site explorer > exact URL mode

URLs with zero external backlinks and thin internal linking are safe to leave blocked - Google drops those on its own within 6-12 months. URLs with steady inbound link equity are the ones worth actively fixing.

The "noindex in robots.txt" trap

Between 2008 and 2019, some sites used Noindex: as a directive inside robots.txt. It sort of worked in Google but was never part of the spec. Google officially stopped supporting it in September 2019. Many old guides still recommend it.

If you inherited a site with Noindex: /path/ rules in robots.txt, they're doing absolutely nothing now. Replace them with meta robots or X-Robots-Tag headers on the actual pages.

Frequently Asked Questions

Why is my page indexed though blocked by robots.txt?
Robots.txt blocks crawling, not indexing. If another site links to your URL, Google can still add it to its index based on the anchor text and external context - even without crawling the page content. The result is a URL in search results with no description.
How do I remove a page that is indexed though blocked?
Remove the Disallow rule from robots.txt so Googlebot can crawl the page, then add a noindex meta tag to the page itself. Once Google recrawls and sees the noindex, it will drop the URL from the index. You can then re-block with robots.txt if you want to save crawl budget.
Is "Indexed though blocked" the same as "Blocked by robots.txt"?
No. "Blocked by robots.txt" means Google respected your rule and did not index the URL. "Indexed though blocked" means Google indexed it anyway because of external signals, even though it could not crawl the content. They are different statuses with different fixes.
Find every robots.txt conflict on your site
Free sitemap and indexing analysis in 60 seconds
Analyze My Sitemap Free
Related GSC indexing statuses
All GSC indexing errors