Noindex Directives: Complete Reference for Meta Tags and X-Robots-Tag
Noindex is the most powerful directive in the indexing toolkit — and the easiest to misuse. A single misplaced noindex can de-index your highest-traffic page; a missing one can flood Google's index with thank-you pages and faceted URL variants. This guide covers every form a noindex directive can take: the meta robots tag, the X-Robots-Tag HTTP header, bot-specific variations, the combinable directives that pair with noindex (max-snippet, noarchive, unavailable_after, and others), and the implementation patterns for every major CMS and framework.
What the Noindex Directive Actually Does
The noindex directive instructs search engines to fetch a page, read its content, and then deliberately exclude it from their search index. The page can still be crawled — Googlebot must crawl it to see the directive — but it will not appear in search results. After Google processes the noindex, the URL is removed from the index and any existing rankings disappear.
Two important nuances. First, noindex is a directive, not a request: Google, Bing, Yandex, and DuckDuckGo all honor it reliably (unlike, for example, the crawl-delay hint, which Google ignores entirely). Second, noindex requires crawl access. If the URL is blocked by robots.txt, Googlebot never fetches the page, never sees the noindex directive, and may index the URL anyway based on external link signals — showing it in results with a placeholder snippet that says "A description for this result is not available because of this site's robots.txt."
This is the foundational mistake to avoid: if you want a page de-indexed, allow it to be crawled and serve a noindex directive. Do not Disallow it in robots.txt.
The Two Syntactic Forms: Meta Tag and X-Robots-Tag
Noindex can be delivered in exactly two ways: an HTML meta tag inside the <head> of an HTML document, or an HTTP response header (X-Robots-Tag) sent with any response — HTML, PDF, image, or other binary file. Both forms are equally authoritative; pick based on the response type.
Meta robots tag (HTML only):
<!-- Inside <head> of the HTML page you want de-indexed --> <meta name="robots" content="noindex"> <!-- Equivalent with explicit follow behavior --> <meta name="robots" content="noindex, follow"> <!-- Most aggressive: do not index, do not follow links, do not cache --> <meta name="robots" content="noindex, nofollow, noarchive">
X-Robots-Tag HTTP header (any content type):
# Sent in the HTTP response, before the body HTTP/1.1 200 OK Content-Type: application/pdf X-Robots-Tag: noindex # Combined directives in a single header X-Robots-Tag: noindex, nofollow, noarchive # Multiple headers (also valid; equivalent to comma-separated) X-Robots-Tag: noindex X-Robots-Tag: nofollow
Use the meta tag for HTML pages where you have template-level control. Use X-Robots-Tag when you need to noindex non-HTML resources (PDFs, images, downloadable files), when you cannot edit the page's HTML, or when you want to apply rules at the server level via pattern matching across many URLs at once.
Bot-Specific Noindex: Targeting Individual Crawlers
The default name="robots" meta applies to all crawlers that recognize the directive. To target a specific bot, replace robots with the bot's user-agent token. The same works for X-Robots-Tag by prefixing the bot name before the directives.
<!-- All bots: do not index this page --> <meta name="robots" content="noindex"> <!-- Only Googlebot: noindex (other bots may still index) --> <meta name="googlebot" content="noindex"> <!-- Only Google News: noindex this article from News surface --> <meta name="googlebot-news" content="noindex"> <!-- Only Bingbot --> <meta name="bingbot" content="noindex"> <!-- Combined: index in Google but not in Google News --> <meta name="robots" content="index, follow"> <meta name="googlebot-news" content="noindex">
X-Robots-Tag uses the same convention with a colon-separated bot prefix:
# Apply noindex only to Googlebot X-Robots-Tag: googlebot: noindex, nofollow # Different rules for different bots in the same response X-Robots-Tag: googlebot: noindex X-Robots-Tag: bingbot: noarchive X-Robots-Tag: otherbot: noindex, nofollow
Bot resolution is most-specific-wins. If a page has both robots: index and googlebot: noindex, Googlebot follows the googlebot-specific rule and de-indexes; Bingbot follows the generic rule and indexes. The most common production use case is keeping content out of Google News while still indexing it in main Google search.
Combinable Directives: noindex,follow vs noindex,nofollow
Noindex can be combined with other directives in the same content attribute. The two most common pairings are noindex, follow and noindex, nofollow. Historically, noindex, follow was used to remove a page from search results while still letting link equity flow through its outbound links — useful for utility pages like internal search results that linked to indexable content.
The deprecation that nobody documents clearly. John Mueller stated in 2019 (and Google has reiterated since) that long-term noindex is treated as noindex, nofollow regardless of the explicit follow directive. Once a page has been noindexed for a sustained period — Google has not given an exact threshold, but observed behavior suggests weeks to a few months — Google stops following its outbound links. The directive does not technically "deprecate"; it just stops being honored after the page leaves the index.
Practical implications:
<!-- Short-term: temporarily removed; links still followed --> <meta name="robots" content="noindex, follow"> <!-- Long-term: same effect as the line below regardless of "follow" --> <meta name="robots" content="noindex, nofollow"> <!-- If link equity matters, do NOT noindex; use canonical instead --> <link rel="canonical" href="https://example.com/preferred-version">
If you have pages whose only purpose is to pass link equity to other pages, do not rely on noindex, follow. Either keep the page indexable, or restructure so the equity-passing links live on a page that is genuinely indexable. For details on this decision pattern, see the canonical vs noindex guide.
Snippet and Preview Directives: max-snippet, max-image-preview, max-video-preview
These three directives do not de-index a page — they control how Google displays it in search results. They are commonly combined with noindex on staging environments and with an indexable status on production pages where you want strict control over the SERP appearance.
<!-- max-snippet: limit characters in the text snippet --> <meta name="robots" content="max-snippet:160"> <!-- max-image-preview: none | standard | large --> <meta name="robots" content="max-image-preview:large"> <!-- max-video-preview: max seconds of video preview --> <meta name="robots" content="max-video-preview:30"> <!-- Combined: typical news publisher configuration --> <meta name="robots" content="max-snippet:-1, max-image-preview:large, max-video-preview:-1"> <!-- "-1" means no limit; "0" means no snippet/preview at all --> <meta name="robots" content="max-snippet:0">
Two values to know: -1 means no limit (Google may use as much as it wants), and 0 means no snippet/preview at all. The nosnippet directive is functionally equivalent to max-snippet:0 and is the older form most CMS plugins still emit.
nosnippet, noarchive, noimageindex, unavailable_after
The remaining indexing directives target specific result-page features rather than full indexing status:
nosnippet — Suppresses the text snippet under the result title. The page still indexes; it just appears in results with title and URL only. Useful for paywalled content where the snippet would reveal too much.
noarchive — Disables Google's cached link for the page. Note: Google deprecated the public "cached" link in February 2024, but the directive is still respected for any system that uses cached snapshots (some Google Workspace features, third-party caches that consult this directive). For most sites today, this directive is mostly cosmetic. See robots noarchive for the full timeline.
noimageindex — Prevents images on the page from appearing in Google Images results. The page itself still indexes for web search. This is the correct directive for membership sites that want article copy indexed but member photos kept out of Image Search.
unavailable_after — A scheduled de-indexing trigger. After the specified date, Google removes the page from the index automatically. Date format is RFC 822, RFC 850, or ISO 8601.
<!-- Combined SERP-control directives --> <meta name="robots" content="nosnippet, noarchive, noimageindex"> <!-- Time-bombed indexing: drops out of the index after the date --> <meta name="robots" content="unavailable_after: 2026-12-31T23:59:59-08:00"> <!-- Promo page that should auto-expire --> <meta name="robots" content="index, follow, unavailable_after: 2026-06-01T00:00:00+00:00">
unavailable_after is underused. It is ideal for time-limited landing pages (Black Friday promos, expired job listings, event registration pages) where you want the page indexed during its useful window and automatically de-indexed afterward — no manual cleanup required.
The Critical Difference: noindex vs robots.txt Disallow
This is the single most-confused topic in technical SEO and the source of more accidental indexation problems than any other. Read this section carefully.
robots.txt Disallow blocks crawling. Googlebot reads robots.txt before fetching any URL on the host. If a URL matches a Disallow rule, Googlebot does not request the URL. It does not download the HTML, does not see meta tags, does not see headers.
Noindex blocks indexing. Googlebot must fetch the URL to see a noindex directive (whether in HTML or HTTP header). After fetching and processing, the URL is excluded from the search index.
Disallow does NOT prevent indexing. If a Disallowed URL accumulates inbound links, Google may index the URL based on external signals (anchor text, link context) without ever crawling it. The result is the famous "Indexed, though blocked by robots.txt" status in Search Console — a URL that appears in search results with no description.
You cannot noindex via robots.txt. Google removed support for the unofficial noindex: directive in robots.txt on September 1, 2019. Any noindex: rule in your robots.txt today is ignored by all major crawlers. See robots.txt Disallow directory for the full mechanics of Disallow.
The combination that does NOT work — and that breaks indexing on countless sites every year:
# robots.txt — Disallows the URL User-agent: * Disallow: /private/ # /private/page.html also has: # <meta name="robots" content="noindex"> # Result: Googlebot never fetches /private/page.html # It never sees the noindex tag. # If the URL has external links, it gets indexed anyway with no snippet. # To actually de-index: REMOVE the Disallow, KEEP the noindex.
Correct decision tree: To remove a page from Google's index, allow crawling and serve noindex. To save crawl budget on URLs you do not care about, Disallow in robots.txt and accept that they may show up in results without snippets if anyone links to them. To do both — keep Google out of the URL entirely — Disallow the URL and ensure no external links exist (which you cannot fully control).
Server-Side Implementation: Apache, nginx, Next.js
X-Robots-Tag is set at the server or framework level. Three production patterns covering most stacks:
Apache .htaccess — noindex all PDFs:
# .htaccess — apply X-Robots-Tag to all PDF and DOC files <FilesMatch "\.(pdf|doc|docx)$"> Header set X-Robots-Tag "noindex, noarchive" </FilesMatch> # Pattern match: noindex everything under /staging/ <LocationMatch "^/staging/"> Header set X-Robots-Tag "noindex, nofollow" </LocationMatch> # Bot-specific: keep Bingbot out of /beta/ entirely <LocationMatch "^/beta/"> Header set X-Robots-Tag "bingbot: noindex" </LocationMatch>
nginx — noindex by URL pattern and content type:
# nginx.conf — noindex all PDF responses
location ~* \.(pdf|doc|docx)$ {
add_header X-Robots-Tag "noindex, noarchive" always;
}
# noindex an entire path
location /internal/ {
add_header X-Robots-Tag "noindex, nofollow" always;
try_files $uri $uri/ /index.html;
}
# Conditional noindex for staging hostname
if ($host = "staging.example.com") {
add_header X-Robots-Tag "noindex, nofollow" always;
}Next.js App Router — metadata.robots:
// app/private/page.tsx
import type { Metadata } from 'next';
export const metadata: Metadata = {
robots: {
index: false,
follow: true,
nocache: true,
googleBot: {
index: false,
follow: true,
noimageindex: true,
'max-video-preview': -1,
'max-image-preview': 'large',
'max-snippet': -1,
},
},
};
// For X-Robots-Tag headers in middleware (next.config.js):
// headers: [{ source: '/private/:path*',
// headers: [{ key: 'X-Robots-Tag', value: 'noindex, nofollow' }] }]WordPress, Shopify, Wix Implementation
WordPress. Every page and post in WordPress has a built-in "Discourage search engines" switch (Settings → Reading) that applies noindex, nofollow site-wide. Do not use this on production. For per-page control, install Yoast SEO or Rank Math: each post editor gets an "Allow search engines to show this Post in search results?" toggle that emits the meta robots tag. WooCommerce ships with sensible defaults (cart, checkout, account pages auto-noindexed). For custom rules, both Yoast and Rank Math allow noindex by post type, taxonomy, and custom URL pattern.
Shopify. Shopify automatically noindexes /account/*, /cart, /checkout, and search result pages. For custom noindex on a specific page, edit theme.liquid and add a conditional in the head:
{% comment %} In theme.liquid <head> {% endcomment %}
{% if template contains 'page.private' or page.handle == 'thank-you' %}
<meta name="robots" content="noindex, nofollow">
{% endif %}
{% comment %} Noindex tag pages with thin content {% endcomment %}
{% if template == 'collection' and collection.products_count < 3 %}
<meta name="robots" content="noindex, follow">
{% endif %}Wix. Wix exposes per-page indexing in the page SEO settings panel: open Page Menu → SEO Basics → toggle "Let search engines index this page." Disabling the toggle emits <meta name="robots" content="noindex"> on that page. Wix does not support custom X-Robots-Tag headers; if you need header-level noindex (for example, on file downloads), Wix is the wrong platform for that requirement.
Common Mistakes and How to Validate
Mistake 1: Noindex on the homepage after a redesign. Staging environments are typically noindexed site-wide. When migrating to production, the noindex meta tag travels with the templates. Symptom: organic traffic falls to zero within a week. Always view-source on your production homepage immediately after deploy and grep for noindex.
Mistake 2: Noindex + Disallow. Covered above. The Disallow blocks Googlebot from ever seeing the noindex, leaving the URL eligible for indexing via external links.
Mistake 3: Conflicting directives in the same page. Two meta robots tags with different content values (one from a theme, one from an SEO plugin). Google's behavior: the most restrictive directive wins. noindex beats index; nofollow beats follow.
Mistake 4: Noindex via JavaScript only. A meta tag injected via useEffect or other client-side code is invisible to Googlebot's first-wave crawl. Google may index the page before its second-wave render picks up the directive. Always emit noindex in server-rendered HTML or as an HTTP header.
Validation commands:
# Check meta robots tag in raw HTML curl -s https://example.com/page | grep -i 'name="robots"' # Check X-Robots-Tag in response headers curl -sI https://example.com/page.pdf | grep -i x-robots-tag # Check what Googlebot sees (user-agent + raw HTML) curl -s -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \ https://example.com/page | grep -iE 'name="(robots|googlebot)"' # Bulk audit: list URLs and their robots directive while read URL; do DIRECTIVE=$(curl -s "$URL" | grep -ioE 'name="robots"[^>]+content="[^"]+"') echo "$URL | $DIRECTIVE" done < urls.txt
For the canonical validation method, use Google Search Console's URL Inspection tool. The "Indexing allowed?" line tells you whether Google currently sees a noindex directive on the URL, and the "Crawled as" section shows the user-agent that fetched the page. If you suspect a page is incorrectly noindexed, see why pages are not indexed for the diagnostic walkthrough.
Recovery Time After Removing Noindex
Once you remove a noindex directive, Google must recrawl the page to see the change before the URL becomes eligible for indexing again. Timeline expectations:
High-priority URLs (homepage, top-trafficked pages): 1–3 days for recrawl, indexed within the same window. These pages are crawled multiple times per day.
Mid-tier URLs (category pages, regularly updated content): 1–2 weeks for recrawl. Indexing follows within a few days of the recrawl.
Long-tail URLs (older blog posts, deep product pages): 2–6 weeks for recrawl. These have low crawl priority and may not be revisited until Google's scheduled refresh cycle catches them.
To accelerate recovery: open Google Search Console, run URL Inspection on the affected URL, click "Test Live URL" to confirm the noindex is no longer present in the live page, then click "Request Indexing." This pushes the URL into Google's priority crawl queue. Do this for the 5–10 most important URLs; bulk recovery still depends on natural crawl cadence.
Note that ranking recovery is separate from indexing recovery. A URL that has been out of the index for months will be re-indexed quickly, but its ranking history is gone — Google rebuilds ranking signals from scratch as the page accumulates fresh signals. Expect the URL to enter the index within days, but to take 4–8 weeks to return to its previous ranking position.
Related Guides
- X-Robots-Tag: HTTP Header Reference for Indexing Control
- Canonical vs Noindex: Which to Use and When
- Robots Noarchive: What It Does After Cache Deprecation
- Robots.txt Disallow Directory: Patterns and Pitfalls
- Why Pages Are Not Indexed: Diagnostic Guide
- Noindex Follow: When to Use This Robots Directive
- Noindex Nofollow: Combined Robots Meta Directive
- Robots.txt Noindex: Why It No Longer Works
- How to Remove a URL from Google Search Console
- De-Indexing Pages: How to Remove Content from Google
- .htaccess Noindex: Server-Level Index Control