By SitemapFixer Team
Updated April 2026

PerplexityBot: What It Crawls and How to Block It

Check your sitemap and robots.txt configurationAnalyze My Sitemap

PerplexityBot is the web crawler operated by Perplexity AI, the AI-powered answer engine. It crawls the web to build the index that Perplexity uses to answer user queries with cited sources. Unlike training-focused crawlers like GPTBot or ClaudeBot, PerplexityBot is primarily a search indexing crawler — its purpose is closer to Googlebot than to AI training bots.

PerplexityBot became controversial in 2024 when researchers and publishers discovered it was crawling websites that had explicitly blocked it in robots.txt. This raised serious questions about Perplexity's robots.txt compliance that persist into 2026. This guide covers the facts, the controversy, and what you can actually do to control PerplexityBot's access.

PerplexityBot Technical Details

PropertyValue
Primary user agent tokenPerplexityBot
Secondary user agentMozilla/5.0 ... (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
PurposeSearch index for Perplexity AI answer engine
OperatorPerplexity AI
robots.txt complianceDisputed (see controversy section below)

What PerplexityBot Crawls

PerplexityBot crawls publicly accessible web pages to index their content for Perplexity's answer engine. When a user asks Perplexity a question, Perplexity retrieves relevant pages from its index and synthesizes an answer with citations. Your page being in Perplexity's index means your content could appear as a source in Perplexity responses — with attribution links back to your site.

Unlike pure training crawlers, PerplexityBot has a direct and visible effect on whether your site appears as a cited source in Perplexity answers. This is analogous to Googlebot's role in determining whether your page ranks in Google search — except Perplexity displays fewer results and credits sources more prominently.

PerplexityBot crawls text content, metadata, and structured data. It handles JavaScript rendering to varying degrees, but static HTML content is more reliably indexed. Your XML sitemap helps Perplexity discover your pages if PerplexityBot follows sitemaps (which is not officially confirmed for all versions of the bot).

The robots.txt Compliance Controversy

In June 2024, Wired and other publications reported that PerplexityBot was crawling websites that had explicitly blocked it in robots.txt. The reports included technical evidence: server logs showing PerplexityBot user agent strings on sites with User-agent: PerplexityBot / Disallow: / in their robots.txt.

Perplexity's initial response was to deny the reports, claiming their bots respected robots.txt. Researchers pushed back with log evidence showing the crawling continued after the robots.txt blocks were in place. A second mechanism was identified: Perplexity appeared to be using third-party infrastructure (including some residential IP address pools) to fetch pages in ways that bypassed standard user agent checks.

Perplexity subsequently updated their robots.txt policy documentation and committed to stronger compliance. However, the practical situation as of 2026 is: PerplexityBot's compliance is better than it was in 2024, but is less uniformly reliable than Googlebot's, GPTBot's, or ClaudeBot's compliance. Some site owners report continued crawling despite Disallow directives.

How to Block PerplexityBot

Add this to your robots.txt to block PerplexityBot:

User-agent: PerplexityBot
Disallow: /

If you want to allow some paths but block others:

# Allow public content, block private areas
User-agent: PerplexityBot
Allow: /learn/
Allow: /blog/
Disallow: /dashboard/
Disallow: /api/

Given the documented compliance issues, some site owners add IP-level blocks using their CDN or firewall in addition to robots.txt rules. Perplexity publishes its crawler IP ranges — blocking those IPs provides a harder technical barrier than robots.txt alone.

Should You Block PerplexityBot?

The decision depends on your content type and business goals:

Reasons to allow PerplexityBot

  • Your site appears as a cited source in Perplexity answers, which drives referral traffic. Perplexity displays source attribution more prominently than Google AI Overviews.
  • Perplexity has a large and growing user base of technical and research-oriented users — the same audience many B2B and SaaS sites want to reach.
  • Being indexed by multiple AI search engines (Perplexity, ChatGPT search, Google) diversifies your traffic sources.

Reasons to block PerplexityBot

  • You have paywalled or subscription content that should not be summarized freely in AI answers.
  • You are concerned about content being reproduced in AI-generated answers without driving click-through (the "zero-click" concern).
  • You object on principle to your content being used to power an AI service without compensation or opt-in.
  • You have verified via logs that PerplexityBot is consuming significant crawl budget without commensurate referral traffic benefit.

PerplexityBot vs. Other AI Crawlers

BotPrimary userobots.txt reliabilityCitation in results
GooglebotSearch indexExcellentYes (AI Overviews)
GPTBotAI trainingGoodIndirect (future models)
ClaudeBotAI trainingGoodIndirect (future models)
PerplexityBotAI search indexVariableYes (prominent attribution)

Verifying PerplexityBot in Your Server Logs

To check if PerplexityBot is crawling your site:

  1. Access your server access logs (Apache, Nginx, or via your CDN dashboard)
  2. Filter for user agent strings containing "PerplexityBot"
  3. Check the source IP addresses against Perplexity's published IP ranges to confirm authenticity
  4. If your robots.txt has a Disallow for PerplexityBot and you still see crawl activity, you may be seeing the compliance issue documented in 2024
  5. Consider adding an IP-level block at your CDN or firewall if robots.txt-based blocking is insufficient

Cloudflare's Bot Fight Mode and similar CDN bot management tools can identify and block PerplexityBot traffic at the infrastructure level, which is more reliable than robots.txt alone if you want hard enforcement.

Check your sitemap and crawl configuration
Free — identifies crawl, indexing and bot access issues in 60 seconds
Analyze My Sitemap Free

Related Guides