Updated April 2026

Sitemap vs robots.txt: How They Work Together

Check both files for conflictsAnalyze My Site Free

Every website that wants to rank on Google should have both a sitemap and a robots.txt file. They do not do the same thing — they are complementary tools that work together. Getting either one wrong can silently hurt your indexing without any obvious error messages.

What a Sitemap Does

A sitemap is a list of URLs you want Google to crawl and index. It is an invitation — you are telling Google "here are the pages on my site, please visit them." The sitemap also carries metadata: when each page was last updated (lastmod), how often it changes (changefreq), and relative priority. Google uses this information to prioritize its crawling. A sitemap does not guarantee indexing — it just helps Google discover and prioritize your pages. Google can still choose not to index a page even if it is in your sitemap.

What robots.txt Does

Robots.txt tells crawlers which pages they are not allowed to access. It is a gate — you are saying "do not enter here." The file lives at yoursite.com/robots.txt and uses simple rules: User-agent specifies which crawler the rule applies to (use * for all), and Disallow specifies which paths to block. Critically, robots.txt only controls crawling, not indexing. A page blocked in robots.txt can still be indexed if Google learns about it through a link from another site — it just cannot read the content.

The Critical Conflict to Avoid

Never put a URL in your sitemap that is also blocked in robots.txt. This is a direct contradiction: your sitemap says "index this" and robots.txt says "you cannot even crawl it." Google will report this as an error in Search Console under Sitemaps. The URL cannot be indexed because Googlebot cannot access the content. Either remove the URL from your sitemap, or remove the robots.txt block — depending on whether you want the page indexed.

What to Put in Your Sitemap

Include only pages you want indexed: your homepage, published blog posts, product pages, category pages, landing pages, and key content pages. Exclude: admin pages, cart and checkout, search results, user account pages, paginated pages beyond page 2–3, and any page with a noindex tag. Your sitemap should be a curated list of your best content, not an exhaustive dump of every URL on your domain.

What to Block in robots.txt

Block pages that should never be crawled: admin dashboards (/admin/, /wp-admin/), internal search results (/search?), staging or dev areas, user-generated private content, API endpoints, and pages with infinite URLs from session IDs or tracking parameters. Blocking these saves crawl budget for your important pages. A well-maintained robots.txt on a large site can dramatically improve how efficiently Google crawls your content.

Add Your Sitemap URL to robots.txt

Best practice is to reference your sitemap URL at the bottom of robots.txt:

User-agent: *
Disallow: /admin/
Disallow: /checkout/
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

This helps any crawler that reads robots.txt — including Bing, DuckDuckGo, and others — find your sitemap automatically without needing to submit it manually to each search engine.

Check for sitemap and robots.txt conflicts

Free analysis in 60 seconds

Analyze My Site Free