Updated April 2026

PerplexityBot: What It Crawls and How to Block It

Check your sitemap and robots.txt configurationAnalyze My Sitemap

PerplexityBot is the web crawler operated by Perplexity AI, the AI-powered answer engine. It crawls the web to build the index that Perplexity uses to answer user queries with cited sources. Unlike training-focused crawlers like GPTBot or ClaudeBot, PerplexityBot is primarily a search indexing crawler — its purpose is closer to Googlebot than to AI training bots.

PerplexityBot became controversial in 2024 when researchers and publishers discovered it was crawling websites that had explicitly blocked it in robots.txt. This raised serious questions about Perplexity's robots.txt compliance that persist into 2026. This guide covers the facts, the controversy, and what you can actually do to control PerplexityBot's access.

PerplexityBot Technical Details

Property	Value
Primary user agent token	PerplexityBot
Secondary user agent	Mozilla/5.0 ... (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Purpose	Search index for Perplexity AI answer engine
Operator	Perplexity AI
robots.txt compliance	Disputed (see controversy section below)

What PerplexityBot Crawls

PerplexityBot crawls publicly accessible web pages to index their content for Perplexity's answer engine. When a user asks Perplexity a question, Perplexity retrieves relevant pages from its index and synthesizes an answer with citations. Your page being in Perplexity's index means your content could appear as a source in Perplexity responses — with attribution links back to your site.

Unlike pure training crawlers, PerplexityBot has a direct and visible effect on whether your site appears as a cited source in Perplexity answers. This is analogous to Googlebot's role in determining whether your page ranks in Google search — except Perplexity displays fewer results and credits sources more prominently.

PerplexityBot crawls text content, metadata, and structured data. It handles JavaScript rendering to varying degrees, but static HTML content is more reliably indexed. Your XML sitemap helps Perplexity discover your pages if PerplexityBot follows sitemaps (which is not officially confirmed for all versions of the bot).

The robots.txt Compliance Controversy

In June 2024, Wired and other publications reported that PerplexityBot was crawling websites that had explicitly blocked it in robots.txt. The reports included technical evidence: server logs showing PerplexityBot user agent strings on sites with User-agent: PerplexityBot / Disallow: / in their robots.txt.

Perplexity's initial response was to deny the reports, claiming their bots respected robots.txt. Researchers pushed back with log evidence showing the crawling continued after the robots.txt blocks were in place. A second mechanism was identified: Perplexity appeared to be using third-party infrastructure (including some residential IP address pools) to fetch pages in ways that bypassed standard user agent checks.

Perplexity subsequently updated their robots.txt policy documentation and committed to stronger compliance. However, the practical situation as of 2026 is: PerplexityBot's compliance is better than it was in 2024, but is less uniformly reliable than Googlebot's, GPTBot's, or ClaudeBot's compliance. Some site owners report continued crawling despite Disallow directives.

How to Block PerplexityBot

Add this to your robots.txt to block PerplexityBot:

User-agent: PerplexityBot
Disallow: /

If you want to allow some paths but block others:

# Allow public content, block private areas
User-agent: PerplexityBot
Allow: /learn/
Allow: /blog/
Disallow: /dashboard/
Disallow: /api/

Given the documented compliance issues, some site owners add IP-level blocks using their CDN or firewall in addition to robots.txt rules. Perplexity publishes its crawler IP ranges — blocking those IPs provides a harder technical barrier than robots.txt alone.

Should You Block PerplexityBot?

The decision depends on your content type and business goals:

Reasons to allow PerplexityBot

Your site appears as a cited source in Perplexity answers, which drives referral traffic. Perplexity displays source attribution more prominently than Google AI Overviews.
Perplexity has a large and growing user base of technical and research-oriented users — the same audience many B2B and SaaS sites want to reach.
Being indexed by multiple AI search engines (Perplexity, ChatGPT search, Google) diversifies your traffic sources.

Reasons to block PerplexityBot

You have paywalled or subscription content that should not be summarized freely in AI answers.
You are concerned about content being reproduced in AI-generated answers without driving click-through (the "zero-click" concern).
You object on principle to your content being used to power an AI service without compensation or opt-in.
You have verified via logs that PerplexityBot is consuming significant crawl budget without commensurate referral traffic benefit.

PerplexityBot vs. Other AI Crawlers

Bot	Primary use	robots.txt reliability	Citation in results
Googlebot	Search index	Excellent	Yes (AI Overviews)
GPTBot	AI training	Good	Indirect (future models)
ClaudeBot	AI training	Good	Indirect (future models)
PerplexityBot	AI search index	Variable	Yes (prominent attribution)

Verifying PerplexityBot in Your Server Logs

To check if PerplexityBot is crawling your site:

Access your server access logs (Apache, Nginx, or via your CDN dashboard)
Filter for user agent strings containing "PerplexityBot"
Check the source IP addresses against Perplexity's published IP ranges to confirm authenticity
If your robots.txt has a Disallow for PerplexityBot and you still see crawl activity, you may be seeing the compliance issue documented in 2024
Consider adding an IP-level block at your CDN or firewall if robots.txt-based blocking is insufficient

Cloudflare's Bot Fight Mode and similar CDN bot management tools can identify and block PerplexityBot traffic at the infrastructure level, which is more reliable than robots.txt alone if you want hard enforcement.

Check your sitemap and crawl configuration

Free — identifies crawl, indexing and bot access issues in 60 seconds

Analyze My Sitemap Free