By SitemapFixer Team
Updated April 2026

robots.txt Noindex: Why It Does Not Work (And What to Use Instead)

Audit your robots.txt and indexing rules freeRun Free Scan

If you are reading this because you tried adding Noindex: /private/ to your robots.txt and it did not work, here is the short answer: it has not worked since September 1, 2019, and even before that it was an undocumented Google extension that was never part of the robots.txt specification. Today, Googlebot reads the line and silently discards it. Worse, the URLs you wanted to deindex are probably still in the search results, because the directive you needed lives on the page itself, not in robots.txt. This guide walks through why this myth is so persistent, what actually happens when you try to use it, and how to fix sites that still depend on it.

The Myth: Noindex in robots.txt

The myth goes like this: you can tell Google to keep a URL out of the index by adding a Noindex: directive to your robots.txt file, the same way you would add a Disallow:. The reasoning sounds clean: one file, one place to manage all crawler instructions, no need to touch every page. SEO blogs, Stack Overflow answers, and even some older Google documentation pages described this approach. It looked like an officially supported feature.

Here is the broken approach you may have seen recommended:

# robots.txt - THIS DOES NOT WORK
User-agent: *
Disallow: /admin/
Noindex: /private/
Noindex: /tag/
Noindex: /author/
Noindex: /search/

# Goal: keep these paths out of Google's index
# Reality: silently ignored since Sep 1, 2019

Every Noindex: line in that example is a no-op. Googlebot parses the file, sees a directive it does not recognize as part of the official spec, and skips it. Your URLs at /tag/, /author/, and /search/ can be — and frequently are — indexed regardless.

Google's Official Deprecation in 2019

On July 2, 2019, Google announced it was open-sourcing its robots.txt parser and pushing for the Robots Exclusion Protocol to become a formal IETF standard. As part of that push, Google also published a list of unsupported rules that would be retired on September 1, 2019. The list included Noindex, Nofollow, and Crawl-delay when used inside robots.txt.

The official Google Search Central post was unambiguous: "In the interest of maintaining a healthy ecosystem and preparing for potential future open source releases, we're retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019." That is the cutoff. Any directive that was not part of the formal spec — including Noindex: in robots.txt — became officially unsupported on that date.

It is important to understand that Noindex in robots.txt was never officially supported in the first place. Google honored it informally for several years because enough sites used it, but it was always an unstable convention. Other search engines (Bing, Yandex, DuckDuckGo) never supported it at all. So even during the years Google quietly processed it, your noindex coverage was inconsistent across search engines.

What Actually Happens When You Use It Today

If you add Noindex: to your robots.txt today, three things happen:

1. Googlebot ignores the line silently. No error, no warning in Google Search Console. The robots.txt tester (in legacy GSC) will not flag it as invalid because the parser simply skips lines it does not recognize. Your file passes validation, but the directive does nothing.

2. The URLs you wanted to noindex remain crawlable and indexable. If they are linked from anywhere — your sitemap, internal navigation, external sites — Google can index them. You may see them appear in site:yourdomain.com searches and in the GSC Pages report under "Indexed" categories.

3. You get a false sense of security. Because there is no error message, site owners assume the rule is working. Months later they discover their /admin/ pages, internal search results, or staging URLs are indexed and pulling in unwanted impressions — sometimes even ranking for branded queries.

This silent-ignore behavior is the worst-case outcome for an SEO directive. A loud failure (a 500 error, a parse error in GSC) would force you to fix it. A silent failure can persist for years.

The Correct Alternative: Meta Robots Noindex

The official, supported way to tell Google not to index a page is the meta robots tag in the page's HTML <head>. This is a per-page directive — Googlebot fetches the page, sees the tag, and excludes the URL from the index.

<!-- Correct: meta robots noindex in the page <head> -->
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>Internal Admin Page</title>
  <meta name="robots" content="noindex, nofollow">
</head>
<body>
  <!-- page content -->
</body>
</html>

<!-- Variations:
     content="noindex"             - exclude from index, follow links
     content="noindex, nofollow"   - exclude from index, ignore links
     content="noindex, follow"     - same as "noindex" (follow is default)
-->

The critical implementation detail: Googlebot must be able to crawl the page to see this tag. If you also have a Disallow: rule for the same URL in robots.txt, Google never fetches the page, never sees the noindex, and the URL stays indexed. We will come back to this in detail in the hierarchy section below.

For Next.js, WordPress, and most CMS platforms, the meta robots tag is exposed as a per-page setting (Yoast and Rank Math both have a noindex toggle on every post). For static sites, add it directly to the template for the URL pattern you want to exclude.

X-Robots-Tag HTTP Header for Non-HTML Resources

Meta robots only works for HTML pages because it lives inside an HTML <head>. For PDFs, images, and other non-HTML resources you want to keep out of the index, use the X-Robots-Tag HTTP response header. It supports the same directives as meta robots, but you set it at the server level.

# nginx: add X-Robots-Tag noindex to all PDFs
location ~* \.pdf$ {
  add_header X-Robots-Tag "noindex, nofollow";
}

# nginx: noindex an entire directory
location /private/ {
  add_header X-Robots-Tag "noindex";
}

# Apache .htaccess: noindex all PDFs and DOCX files
<FilesMatch "\.(pdf|docx)$">
  Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

# Sample HTTP response with the header set:
# HTTP/1.1 200 OK
# Content-Type: application/pdf
# X-Robots-Tag: noindex, nofollow

# Express.js / Node.js example
app.get('/private/*', (req, res, next) => {
  res.setHeader('X-Robots-Tag', 'noindex, nofollow');
  next();
});

The X-Robots-Tag is the closest thing to a "robots.txt-style" bulk noindex that actually works — you can apply it at the server-block level to entire directories or file extensions, without modifying every individual page or document.

For password-protected or authenticated areas, you generally do not need either approach: Google cannot access the content behind the auth wall, so it cannot index the body. However, if the URLs themselves leak (via referrer headers, public links, or sitemap mistakes), pair authentication with an X-Robots-Tag: noindex on the login redirect to be safe.

Why Disallow is NOT a Substitute for Noindex

This is the second-most-common robots.txt mistake. People know Noindex: in robots.txt does not work, so they reach for Disallow: instead and assume that blocking the crawl is the same as preventing indexing. It is not.

Disallow blocks crawling. It tells Googlebot not to fetch the URL. noindex blocks indexing. It tells Google not to include the URL in search results. These are different operations.

If a URL is disallowed in robots.txt but has links pointing to it from other sites (or from your own sitemap), Google can still index the URL. The SERP listing will look like this:

# What a Disallow'd-but-indexed URL looks like in Google SERPs:

example.com/private/internal-doc
https://example.com/private/internal-doc
No information is available for this page. Learn why

# That "No information is available" line is the give-away.
# Google has the URL in its index, but cannot fetch the body
# because robots.txt blocks the crawl. The URL still ranks
# (weakly) and still shows up for site: queries.

# Common GSC label for this state:
# "Indexed, though blocked by robots.txt"

You will see this state reported in Google Search Console under Indexing → Pages → "Indexed, though blocked by robots.txt." That is GSC literally telling you Disallow did not prevent indexing. The fix is never to add another Disallow rule — it is to remove the Disallow, let Google crawl the page, and serve a meta robots noindex or X-Robots-Tag noindex on the page itself.

The Hierarchy Trap: Disallow + Noindex Cancel Each Other Out

Here is the failure mode that catches the most sites. You want a section deindexed, so you do both: add a Disallow: in robots.txt and a meta robots noindex on every page in that section. It feels like a belt-and-braces approach. It is actually a guaranteed failure.

The sequence Google follows:

1. Google reads robots.txt. It sees Disallow: /private/. It will not fetch any URL under that path.

2. A backlink, sitemap entry, or internal link points Google at /private/page-1. Google adds the URL to its index queue.

3. Google checks robots.txt again. The URL is disallowed, so Googlebot does not fetch the page body.

4. Google never sees the <meta name="robots" content="noindex"> tag, because the page was never fetched.

5. The URL gets indexed (without a snippet) based on the link signals alone. The noindex tag — which would have worked — is invisible to Google.

The page stays indexed indefinitely. The only way out is to remove the Disallow rule so Google can crawl the page, see the noindex tag, and process the deindexation. Once the page is fully out of the index (typically 2–6 weeks), you can optionally re-add a Disallow rule to save crawl budget — but only if you no longer care about the noindex signal being seen on subsequent crawls.

Recovery: Migrating Sites Still Using robots.txt Noindex

If you discover your site has been relying on Noindex: in robots.txt — particularly common on sites that have not been audited since 2018 or so — here is the migration path:

Step 1: Inventory every URL pattern your robots.txt was trying to noindex. Open your robots.txt, copy out every Noindex: line, and list the path patterns. These are the URLs that have been silently indexable since 2019.

Step 2: Confirm the indexation status of each pattern. Run site:yourdomain.com/path/ queries in Google for each path. Cross-reference with the GSC Pages report — filter by URL prefix to see how many URLs in each pattern are indexed.

Step 3: Choose the correct directive for each pattern. HTML pages → meta robots noindex. PDFs and other non-HTML → X-Robots-Tag header. Internal admin areas → ideally HTTP authentication plus X-Robots-Tag.

Step 4: Remove the Noindex lines from robots.txt and remove any conflicting Disallow rules. The Disallow rules are the bigger trap — if you leave them in place, Google cannot see the new noindex tags.

Step 5: Deploy the noindex tags or headers, then submit affected URLs in GSC URL Inspection. For high-priority URLs, click "Request Indexing" to trigger a recrawl. Lower-priority URLs will be naturally recrawled within 1–6 weeks depending on their importance.

# BEFORE (broken - Noindex in robots.txt)
User-agent: *
Disallow: /admin/
Noindex: /tag/
Noindex: /author/
Noindex: /search/

# AFTER (correct robots.txt)
User-agent: *
Disallow: /admin/
# (Noindex lines removed - tag/author/search now noindexed via meta tags)
# Note: /tag/, /author/, /search/ MUST be crawlable so Google can
# see their meta robots noindex tags. Do NOT add Disallow for them.

Sitemap: https://example.com/sitemap.xml

# Then on every /tag/, /author/, /search/ page, add to <head>:
# <meta name="robots" content="noindex, follow">

# Verify deployment:
curl -s https://example.com/tag/example | grep -i 'name="robots"'
# Expected output:
# <meta name="robots" content="noindex, follow">

Track GSC's "Indexed" count for the affected URL patterns weekly. A correct migration will show the indexed count declining steadily over 3–8 weeks. If counts stay flat after 4 weeks, check that Googlebot is actually fetching the URLs (URL Inspection → Live Test) and that no Disallow rule is silently blocking the recrawl.

Validation Tools and Verification

Before assuming a noindex implementation is working, verify with these tools:

Google Search Console URL Inspection. Enter the URL, click "Test Live URL," and check the "Indexing allowed?" field. If it says "No: 'noindex' detected in 'robots' meta tag," the implementation is correct. If it says "Yes" despite your having added the tag, either the tag is not in the rendered HTML or a Disallow is blocking the fetch.

curl for raw HTML inspection. Run curl -s https://example.com/page | grep -i 'name="robots"' to confirm the meta robots tag is present in the server-rendered HTML, not just injected by client-side JavaScript. JS-only injection is a frequent cause of noindex tags being invisible to first-wave crawls.

curl for X-Robots-Tag headers. Run curl -I https://example.com/document.pdf and check for the X-Robots-Tag: header in the response. If absent, your server config did not match the URL pattern.

SitemapFixer or Screaming Frog crawl. Bulk-check noindex coverage across your site. Both tools surface every page's indexability status, the meta robots value, and any conflicts with robots.txt — letting you confirm an entire URL pattern is correctly noindexed in one pass instead of spot-checking individual pages.

The robots.txt tester. Use Google's robots.txt report (in GSC) to confirm your robots.txt parses cleanly. The tester will not flag Noindex: as an error — it just ignores it — so a clean tester result does not mean your noindex strategy is working. Always pair the robots.txt check with a URL Inspection on a sample affected URL.

The Short Version

If you remember nothing else from this guide, remember three rules:

1. Noindex: in robots.txt has not worked since September 1, 2019. It is silently ignored.

2. Disallow: blocks crawling, not indexing. Disallowed URLs can still appear in search results without snippets.

3. Disallow + meta robots noindex on the same URL = page stays indexed forever. Google must be able to crawl the page to see the noindex tag. Never combine the two on a URL you want deindexed.

For HTML pages, use <meta name="robots" content="noindex">. For non-HTML resources, use the X-Robots-Tag HTTP header. For private/auth-walled content, use authentication. In none of those cases does robots.txt do the indexing work for you.

Related Guides

Audit your robots.txt and indexing rules
Free analysis in 60 seconds
Analyze My Site Free
Related guides