WordPress Robots.txt: The Complete Guide
The WordPress robots.txt file tells search engines which URLs they may crawl on your site. WordPress generates a virtual robots.txt by default, but you can override it with Yoast, Rank Math, All in One SEO, or by uploading a real file via FTP. This guide shows you exactly how, plus the common WordPress robots.txt mistakes that break indexing.
What robots.txt does in WordPress
Robots.txt is a plain text file at the root of your domain (yourdomain.com/robots.txt) that gives crawling instructions to bots. It does not control indexing - a URL blocked by robots.txt can still be indexed if other sites link to it. For indexing control, use a meta robots noindex tag on the page itself. WordPress treats robots.txt as a traffic cop for crawlers: use it to keep bots out of admin, search results, and feed noise, not to hide pages from search.
Default WordPress robots.txt
If you have not uploaded a file or installed an SEO plugin, WordPress serves this virtual robots.txt out of the box:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
This is generated by WordPress core and served dynamically - there is no file on disk. The moment you drop a real robots.txt into your web root, the virtual one is replaced.
How to edit robots.txt via Yoast, Rank Math, and All in One SEO
Yoast SEO: Go to Yoast SEO > Tools > File editor. Click "Create robots.txt file" if it does not exist yet, edit the contents, and save. Yoast writes a real file to your server root.
Rank Math: Go to Rank Math > General Settings > Edit robots.txt. Toggle it on and edit inline. Rank Math serves it virtually unless a physical file exists.
All in One SEO: Go to All in One SEO > Tools > Robots.txt Editor. Enable the custom robots.txt toggle and add rules using the visual builder or raw editor.
How to edit robots.txt manually via FTP
Connect to your site with FileZilla or your host's file manager. Navigate to the web root (the folder that contains wp-config.php and wp-content/). Create a plain text file named robots.txt, paste your rules, and upload. Confirm it is live by visiting yourdomain.com/robots.txt - the file should render in your browser. A real file always overrides the virtual WordPress one and any SEO plugin's version.
Common WordPress robots.txt mistakes
- Blocking /wp-content/uploads/ - this hides all your images and media from Google Images. Never disallow the uploads folder.
- Blocking /wp-content/ or /wp-includes/ entirely - Googlebot needs CSS and JS from these folders to render your pages. Blocking them breaks mobile-friendly rendering.
- Blocking /feed/ when you have an RSS feed others rely on - and it does not help SEO either way.
- Using wildcards incorrectly -
Disallow: /*?*blocks every URL with a query string, including legitimate filter and pagination URLs. - Adding a noindex directive - Google stopped supporting noindex in robots.txt in 2019. Use a meta robots tag instead.
- Forgetting the Sitemap line - always declare your sitemap URL so crawlers can find it.
Recommended WordPress robots.txt
A safe, modern default for a WordPress blog or business site:
User-agent: * Disallow: /wp-admin/ Disallow: /wp-login.php Disallow: /xmlrpc.php Disallow: /?s= Disallow: /search/ Allow: /wp-admin/admin-ajax.php Allow: /wp-content/uploads/ Sitemap: https://yourdomain.com/sitemap.xml Sitemap: https://yourdomain.com/sitemap_index.xml
Testing robots.txt in Google Search Console
Open Search Console and use the Settings > robots.txt report to see the latest fetched version, timestamp, and any parsing errors. You can also use the URL Inspection tool on a specific page - it will tell you if a URL is blocked by robots.txt. After editing, click "Request a recrawl" so Google picks up the new file within a few hours instead of days.
How the virtual robots.txt actually works
When a bot requests yourdomain.com/robots.txt, WordPress intercepts the request through its rewrite rules. The do_robots() function in wp-includes/functions.php generates the response on the fly, then fires the do_robotstxt action hook. This matters because you can modify the virtual file without ever touching the filesystem - just hook into it from functions.php or a small must-use plugin:
// Add to functions.php - extends the virtual robots.txt
add_filter('robots_txt', function ($output, $public) {
if ('1' != $public) return $output;
$extra = "Disallow: /?s=
";
$extra .= "Disallow: /search/
";
$extra .= "Disallow: /wp-login.php
";
$extra .= "Disallow: /xmlrpc.php
";
$extra .= "Sitemap: " . home_url('/sitemap.xml') . "
";
return $output . $extra;
}, 10, 2);I prefer this approach over a physical file on sites where I want the rules version-controlled with the code. The catch: if the site is marked as discouraging search engines (Settings > Reading), WordPress forces Disallow: / regardless of any filter. Check that setting first if your rules "don't work."
A real case: the 47-entry duplicate sitemap mess
I audited a WordPress site last month that had robots.txt listing 47 different Sitemap lines. Each SEO plugin they'd tried over the years (Yoast, then Rank Math, then SEOPress, then back to Yoast) had left its own entry. Every plugin switch added a new sitemap URL without removing the old ones.
Google was hitting 40+ nonexistent sitemap URLs on every crawl, wasting crawl budget. In GSC, the Sitemaps report showed 12 sitemaps as "Couldn't fetch," which was dragging down trust in the legitimate ones. We deleted the physical robots.txt, switched to the filter-based approach above, and GSC quietly cleaned up within a week.
If you've switched SEO plugins more than once, curl https://yoursite.com/robots.txt right now. I bet you'll find leftovers.
Plugin-specific gotchas
Yoast. Yoast's file editor writes a physical robots.txt to the root. If you've previously been using a filter hook or the virtual file, Yoast silently overrides them. To go back, delete the physical file - Yoast doesn't give you a "reset" button.
Rank Math. Rank Math keeps the file virtual unless a physical one exists. Edits happen in the database via wp_options. Useful on managed hosts where file editing is locked, but confusing if you're also FTP'ing up a real file - the real file wins and Rank Math's database edits do nothing.
All in One SEO. The visual rule builder is nice, but it aggressively adds its own preamble and sitemap declaration. Auditing sites that use it, I regularly find the Sitemap: line pointing to /sitemap.xml when the actual sitemap is at /sitemap_index.xml. Double-check the output at /robots.txt after any change.
Edge cases you'll eventually hit
Multisite. Each subsite needs its own robots.txt. WordPress multisite generates separate virtual files per subdomain or subdirectory. Don't upload a single physical robots.txt to the network root - it'll override every site's virtual file with the same rules.
Staging environments. WP Engine, Kinsta, and SiteGround staging sites often ship with Disallow: / to prevent accidental indexing. When you push staging to production, check robots.txt first. I've seen six-figure traffic drops because someone cloned a blocked staging file to live.
WooCommerce. Don't block /cart/, /checkout/, or /my-account/. These pages carry noindex already, but Google needs to reach them to render product pages that reference cart state. Blocking them can break structured data rendering on products.
CDN caching. Cloudflare and similar CDNs cache robots.txt for minutes or hours. After editing, purge the cached URL specifically - otherwise Google may fetch the cached version for hours after your update.
Diagnosing robots.txt issues fast
# Does a physical file exist? curl -sI https://yoursite.com/robots.txt | grep -i server # If you see a filesize and no WordPress headers -> physical file # Count Sitemap: lines (should be 1-2 max) curl -s https://yoursite.com/robots.txt | grep -c '^Sitemap:' # Test a specific path is blocked as expected curl -s https://yoursite.com/robots.txt | \ grep -iE 'wp-admin|uploads'
In GSC, the Settings > robots.txt report now shows the exact version Google last fetched with a timestamp. If that timestamp is older than a day and you just made a change, use the three-dot menu to request a recrawl.