By SitemapFixer Team
Updated April 2026

Robots.txt Disallow Directory: Syntax, Examples, and Common Mistakes

Blocking a directory in robots.txt is one of the most common crawl management tasks — and one of the most frequently misconfigured. A missing trailing slash, an overly broad wildcard, or a Disallow rule on a directory full of CSS and JavaScript files can quietly break your site's indexing. This guide covers the exact syntax, real-world examples, and every mistake you need to avoid.

Check which directories are blocked on your site
Free robots.txt and sitemap analysis in 60 seconds
Analyze My Site Free

Basic Disallow Syntax for a Directory

To block a directory in robots.txt, use the Disallow directive followed by the path with a trailing slash. The trailing slash is critical — it tells crawlers that everything at that path and below should be blocked.

User-agent: *
Disallow: /admin/

This single rule blocks Googlebot and all other crawlers from accessing /admin/, /admin/users, /admin/settings/email, and every other URL that starts with /admin/.

Without the trailing slash — Disallow: /admin — the rule technically still blocks /admin/, but it also matches unintended paths like /administrator/, /admin-panel/, and /admin-old/. Always use the trailing slash when your intent is to block a specific directory and nothing more.

You can also use a wildcard to make the trailing slash explicit while still matching all content under the directory:

# Equivalent forms — both block /admin/ and everything under it
Disallow: /admin/
Disallow: /admin/*

How to Disallow Multiple Directories

Add one Disallow line per directory within the same User-agent block. There is no syntax for listing multiple paths on a single line — each rule must be on its own line.

User-agent: *
Disallow: /admin/
Disallow: /account/
Disallow: /cart/
Disallow: /checkout/
Disallow: /search/
Disallow: /staging/

Each rule is processed independently. A crawler checks whether the requested URL matches any Disallow rule in the active User-agent block. If any rule matches, the URL is blocked. Blank lines between Disallow directives within the same group are allowed but ignored — they do not start a new group. A new group only begins with a new User-agent line.

Disallow Specific File Types Within a Directory

You can block specific file extensions within a directory using the * wildcard. This is useful when a directory contains both crawlable HTML pages and private files you want to hide from crawlers.

# Block all PDF files inside /documents/
User-agent: *
Disallow: /documents/*.pdf

# Block all .xlsx files anywhere on the site
User-agent: *
Disallow: /*.xlsx$

# Block .csv exports inside /exports/ directory
User-agent: *
Disallow: /exports/*.csv

The $ at the end of a pattern anchors the match to the end of the URL. Disallow: /*.pdf$ blocks URLs that end in .pdf — so /files/report.pdf is blocked but /files/report.pdf?download=1 is not (because the URL does not end in .pdf). Without the $, Disallow: /*.pdf would block both.

Disallow All but Allow a Specific Subdirectory

The Allow directive overrides a broader Disallow rule for specific paths. This is the correct way to block an entire directory while keeping one subdirectory crawlable.

# Block all of /admin/ but allow the public status page
User-agent: *
Disallow: /admin/
Allow: /admin/status/

# Block all of /account/ but allow the signup and login pages
User-agent: *
Disallow: /account/
Allow: /account/login
Allow: /account/signup

Order matters, but not in the way most people expect. Google does not process rules top to bottom and stop at the first match. Instead, when multiple rules match a URL, Google uses the most specific rule — the one with the longest matching path.

For the URL /admin/status/ above: both Disallow: /admin/ (8 characters matched) and Allow: /admin/status/ (15 characters matched) apply. The Allow wins because it is longer. If two rules have the same length, Allow wins over Disallow. This means you can safely put your Allow lines before or after the Disallow — specificity, not order, determines precedence for Googlebot.

Common Directories to Block

These are the directories that almost every site should block from crawlers. They consume crawl budget without contributing indexable value, and some expose functionality that should never be publicly discoverable.

User-agent: *

# CMS and administration
Disallow: /wp-admin/
Disallow: /admin/
Disallow: /administrator/
Disallow: /cms/

# Allow wp-admin/admin-ajax.php (needed by some themes/plugins on frontend)
Allow: /wp-admin/admin-ajax.php

# Ecommerce transactional pages
Disallow: /cart/
Disallow: /checkout/
Disallow: /order-received/
Disallow: /my-account/
Disallow: /account/

# Internal search results
Disallow: /search/
Disallow: /search?
Disallow: /?s=

# Thank you and confirmation pages
Disallow: /thank-you/
Disallow: /confirmation/

A note on /wp-admin/: WordPress already includes Disallow: /wp-admin/ in its default virtual robots.txt, along with Allow: /wp-admin/admin-ajax.php. If you are overriding the default with a custom file, you must include the admin-ajax.php exception yourself. Many themes and plugins make frontend requests to /wp-admin/admin-ajax.php, and if Googlebot cannot access it, it may not be able to render your pages fully.

Wildcards in Directory Disallow Rules

Google supports two wildcards in robots.txt: * and $.

  • * (asterisk) — matches zero or more of any character. It can appear anywhere in the path string.
  • $ (dollar sign) — anchors the pattern to the end of the URL. Only valid at the end of the pattern string.
# Block all URLs containing /page/ in any position
Disallow: /*/page/

# Block URLs with query parameters in /products/
Disallow: /products/*?*

# Block all .json files anywhere on the site
Disallow: /*.json$

# Block all print-version URLs
Disallow: /*?print=1$

# Block paginated archives like /category/news/page/2/
Disallow: /category/*/page/

Bing and other crawlers also support * and $ in the same way. However, some older or less common bots may not support wildcards and will fall back to treating the literal characters as part of the path. For mainstream SEO purposes — Google and Bing — wildcards work exactly as documented above.

Blocking a Directory in Robots.txt vs. Noindex

These two mechanisms are frequently confused. They do different things and they cannot be combined on the same URL.

  • Disallow in robots.txt — tells crawlers not to visit the URL. The page is not read, so its content, links, and meta tags are unknown to the crawler. Does not prevent the URL from being indexed if it is linked from elsewhere.
  • Noindex meta tag / X-Robots-Tag — tells Google not to include the URL in search results. The page must be crawlable for Google to read and respect the noindex signal. Prevents indexing but allows crawling.
<!-- Noindex on the page — page must NOT be in robots.txt Disallow -->
<meta name="robots" content="noindex, nofollow">

<!-- X-Robots-Tag HTTP header — alternative to meta tag for non-HTML files -->
X-Robots-Tag: noindex

Use Disallow when you want to protect crawl budget and the page has no link equity value. Use noindex when you want the page crawled (so Google can follow links on it) but not listed in search results — for example, paginated pages past page 2, thin category pages, or duplicate regional variants. Never apply both to the same URL.

Does Disallowing a Directory Remove It From Google's Index?

No — and this is the most important thing to understand about robots.txt Disallow. Blocking a directory prevents Googlebot from crawling those URLs, but it does not remove them from the index and it does not prevent them from being indexed in the future.

Here is what actually happens: if another website links to a URL in your blocked directory, Googlebot discovers that URL from the external link signal. Even though it cannot crawl and read the page, Google may still add the URL to its index. It will appear in search results with a generated title and no description snippet — essentially a ghost listing.

If you have existing pages in a blocked directory that are already indexed and you want them removed, you have two options. First, temporarily allow crawling of those pages, add a noindex tag, and wait for Google to recrawl and drop them from the index — then block the directory again once the pages are removed. Second, use the URL Removal Tool in Google Search Console to request temporary removal (it lasts 6 months and must be renewed, or you can make it permanent by keeping the noindex). The removal tool is faster but the noindex approach is the permanent solution.

How to Verify a Directory Is Blocked Using Google's Robots.txt Tester

Google Search Console provides a robots.txt tester that lets you confirm whether a specific URL is blocked by your current rules.

  1. Open Google Search Console and select your property.
  2. Go to Settings (gear icon, bottom-left) and click robots.txt report.
  3. In the test box at the bottom of the page, enter a URL from the directory you want to verify — for example, /admin/users.
  4. Click Test. The tool shows whether Googlebot can or cannot access the URL, and highlights the specific rule responsible for the result.

You can also test from the URL Inspection tool in Search Console — enter the full URL and check the "Crawl allowed?" field in the results. This method also shows you the last time Googlebot tried to crawl the page and what it found.

For quick manual verification, fetch your robots.txt in a browser at https://yoursite.com/robots.txt and confirm it returns a 200 status with the correct content. You can also use curl -s https://yoursite.com/robots.txt from the command line. A 404 means Google treats the file as empty (everything crawlable). A 5xx error causes Googlebot to back off crawling your site entirely until the file returns a 2xx response.

Common Mistakes When Blocking Directories

These are the mistakes that cause the most damage in practice.

1. Blocking CSS and JavaScript directories

Google needs to access your CSS and JavaScript files to render your pages correctly. If Googlebot cannot load /assets/, /static/, or /_next/, it sees a broken version of your page. This can cause Google to misclassify well-designed pages as thin or low-quality content, and it blocks Lighthouse-based rendering signals entirely. Never disallow directories that contain front-end assets.

# WRONG — blocks CSS and JS, breaks page rendering
Disallow: /static/
Disallow: /assets/
Disallow: /_next/
Disallow: /wp-content/

# CORRECT — block only content directories, not asset directories
Disallow: /admin/
Disallow: /checkout/

2. Blocking paginated content you want indexed

A common mistake is blocking /page/ to stop paginated archives from being crawled. If your paginated pages contain unique product listings, articles, or other content that you want indexed, blocking those pages removes them from Google's reach. Only block paginated paths if the content is genuinely duplicate or thin — not just because the pages have "page" in the URL.

3. Missing the trailing slash

# WRONG — also blocks /administrator/, /admin-panel/, /admin-old/
Disallow: /admin

# CORRECT — blocks only /admin/ and its children
Disallow: /admin/

4. Accidentally blocking the entire site

# CATASTROPHIC — blocks all crawlers from every page
User-agent: *
Disallow: /

# What you probably meant
User-agent: *
Disallow: /admin/

Disallow: / with a single forward slash blocks everything. This is valid syntax and Google will obey it — all your pages will eventually drop from the index. Always use Google Search Console's robots.txt tester after any change to your robots.txt before deploying it to production.

5. No trailing slash on search parameter blocks

# WRONG — only blocks exact URL /search, not /search?q=term or /search/results/
Disallow: /search

# CORRECT — blocks /search/ directory and all query parameters under it
Disallow: /search/
Disallow: /search?
See exactly which directories are blocked on your site
Free robots.txt audit — check for accidental blocks and sitemap conflicts
Analyze My Site

Related Guides

robots.txtdisallow directorycrawl managementcrawl budgettechnical SEOblock directory robots