Indexed, though blocked by robots.txt (GSC)
"Indexed, though blocked by robots.txt" is one of the most confusing GSC statuses. It means Google added your URL to its index even though your robots.txt file told Googlebot not to crawl it. This happens because robots.txt controls crawling, not indexing - and Google can index a URL based on external links alone, without ever fetching the page content. The result is a URL in search with no description and no way for Google to see your content.
What this GSC status means
Google discovered your URL through an external signal - usually a backlink from another site, a mention in a sitemap, or an internal link - but your robots.txt file includes a Disallow rule that matches the URL. Googlebot obeyed the rule and did not crawl the page. However, Google decided the URL was significant enough to include in its index anyway. In search results, the listing appears with the URL, no description ("A description for this result is not available because of this site's robots.txt"), and no title beyond what external anchors suggest. This is different from "Blocked by robots.txt" - that status means Google respected the block and did not index the URL.
Why this is bad
- No description in search results - dramatically lowers click-through rate compared to a normal snippet.
- You lose control over the title - Google builds the displayed title from external anchor text and URL tokens, which may be unflattering or outdated.
- Cannot use noindex - because Google cannot crawl the page, it never sees any noindex meta tag you add. You are stuck.
- Sensitive URLs leak - admin login pages, staging URLs, or paywalled content may appear in results if robots.txt was the only protection.
- Confuses crawl budget reports - these URLs show up as indexed but provide zero value, polluting coverage metrics.
Common causes (including WordPress)
- External inbound links - another site linked to a URL you blocked. Very common for legacy URLs that moved but still attract backlinks.
- Accidental Disallow - a WordPress plugin or manual edit added an overly broad Disallow rule that matches real content URLs.
- WordPress-specific: /wp-content/ or query-string blocks - sites that block
/?*end up blocking paginated or filtered URLs that have external links. - Legacy URLs from an old CMS - you migrated from Joomla or Drupal, blocked old paths in robots.txt, but backlinks still point at them.
- Blocking admin, login, or search result pages - these often get linked externally by accident and then indexed.
- Staging environment exposed - a staging URL blocked in robots.txt was linked from social or a forum post and got indexed.
How to fix it - three options
Option A: Allow crawling + add noindex (recommended). Remove the Disallow rule so Googlebot can fetch the page, then add <meta name="robots" content="noindex"> or an X-Robots-Tag header. Once Google recrawls and sees the noindex, it drops the URL from the index within 1-4 weeks.
Option B: Remove the external links pointing to the URL. Only realistic if you control the linking sites or the links are internal. Not practical for organic backlinks.
Option C: Leave it and do nothing. Valid if the URL should never appear in search and you don't care about the description-less snippet. For truly sensitive URLs, use password protection or HTTP auth instead - robots.txt was never meant for security.
Step-by-step fix
1. Open GSC > Pages and export the list of URLs with this status. 2. Identify which Disallow rule in robots.txt matches each URL (use GSC's robots.txt Tester). 3. Remove or narrow the Disallow rule in robots.txt so Googlebot can crawl the URL. 4. Add a noindex meta tag to each affected page: <meta name="robots" content="noindex, follow"> Or send an X-Robots-Tag: noindex HTTP header at the server/CDN level. 5. Verify with GSC URL Inspection that the page is now crawlable and noindex is detected. 6. Click "Request Indexing" to speed up the recrawl. 7. Wait 1-4 weeks for Google to drop the URL from its index. 8. Once the URL is deindexed, you can re-add the Disallow rule if you want to save crawl budget. 9. For sensitive URLs, also add HTTP auth or IP allowlisting - robots.txt is not a security control.
The mental model most guides miss
Here's the thing most guides get wrong about this status. Robots.txt blocks CRAWLING, not INDEXING. Those are different operations.
Crawling = fetching the page content. Indexing = adding the URL (with whatever context Google can gather) to the search index. Google can index a URL it has never fetched. It just uses the URL itself, anchor text from external links, and surrounding context on those external pages as the "content."
This is why the status exists. Google saw enough external signals pointing to a URL to decide it's a real thing worth indexing, but your robots.txt said "don't fetch the content." Google honors the fetch rule and indexes it anyway with blank content.
A real case: 400+ /search/ URLs indexed as blanks
A client running a content site had Disallow: /search/ in robots.txt to prevent Google from wasting crawl budget on internal search results. Smart move, right? Except their site had been around for 12 years, and over that time users had posted links to their internal search URLs on Reddit, Stack Exchange, old forums, and other sites.
GSC showed 437 URLs under /search/ as "Indexed, though blocked by robots.txt." Every one appeared in Google's site: search as a blank listing - just the URL, no title, no description. Google had inferred titles from the anchor text in external links, so the listings read things like "how do i reset my password yoursite.com/search?q=reset+password" - which looked spammy and auto-generated.
Fix took 10 minutes. We removed Disallow: /search/. Added <meta name="robots" content="noindex, follow"> to the search results template. Waited 3 weeks. Google recrawled, saw the noindex, dropped all 437 URLs from the index. Then we re-added the Disallow to save crawl budget.
The sequence matters: unblock, add noindex, wait for deindex, re-block. Reversing steps 1 and 4 keeps URLs stuck.
When this status is actually OK to ignore
Not every occurrence needs fixing. Cases where I tell clients to leave it alone:
- Affiliate link redirects. /go/, /out/, /link/ patterns that redirect to merchants. They're not meant to rank, and the blank listings rarely attract clicks anyway.
- Asset URLs (JS, CSS, font files). If Google indexed
/assets/bundle.abc123.jsas a blank, it's ugly but harmless. - One-off URLs with a single external link. Google usually drops these eventually without intervention.
- URLs you've already fixed via removal request in GSC. The status can linger for weeks after the URL is actually deindexed.
Fix it if the URLs are discoverable patterns (search results, filters, category archives) that a user might actually click on and be confused by.
Common mistakes when fixing this
- Adding noindex without removing the Disallow. Classic mistake. Google still can't crawl, so it never sees the noindex. URL stays indexed forever.
- Using the GSC Removal tool as the only fix. Removal is temporary (90 days). Without noindex, the URL comes back.
- Password-protecting instead of noindexing. Password protection returns 401, not noindex. Google may still keep the URL indexed with a note about auth required.
- Expecting "Request Indexing" to remove a URL. Request Indexing asks Google to crawl and consider. If noindex is set, that crawl deindexes. If noindex isn't set, you just reinforced the index entry.
- Thinking robots.txt protects sensitive URLs. It doesn't. Anyone can read robots.txt and see exactly what you're trying to hide. For real privacy, use HTTP auth or IP allowlisting.
How to diagnose which URLs need fixing
# Step 1: Export affected URLs from GSC
# Indexing > Pages > "Indexed, though blocked by robots.txt"
# Click the row, then Export
# Step 2: Check each URL's status at scale
while IFS= read -r url; do
status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
echo "$status $url"
done < affected_urls.txt
# Step 3: For each URL, identify which Disallow rule matches
# Use GSC > Settings > robots.txt > Open robots.txt tester
# Step 4: Check backlinks pointing to the URL
# Ahrefs/Majestic site explorer > exact URL modeURLs with zero external backlinks and thin internal linking are safe to leave blocked - Google drops those on its own within 6-12 months. URLs with steady inbound link equity are the ones worth actively fixing.
The "noindex in robots.txt" trap
Between 2008 and 2019, some sites used Noindex: as a directive inside robots.txt. It sort of worked in Google but was never part of the spec. Google officially stopped supporting it in September 2019. Many old guides still recommend it.
If you inherited a site with Noindex: /path/ rules in robots.txt, they're doing absolutely nothing now. Replace them with meta robots or X-Robots-Tag headers on the actual pages.