Google Crawlers: Complete List of User Agents
Google operates multiple distinct web crawlers, each with its own user agent token and specific purpose. Understanding which Google bot does what matters when configuring robots.txt, diagnosing crawl issues in server logs, and verifying Googlebot activity using IP address lookups.
This guide covers every active Google crawler, its user agent string, what it crawls, and how to target or block it in robots.txt.
Complete Google Crawler Reference
| Crawler Name | User Agent Token | Purpose |
|---|---|---|
| Googlebot | Googlebot | Main web crawl for Google Search index |
| Googlebot-Image | Googlebot-Image | Crawls images for Google Images |
| Googlebot-News | Googlebot-News | Crawls articles for Google News |
| Googlebot-Video | Googlebot-Video | Crawls videos for Google Video Search |
| Google-Extended | Google-Extended | AI training data and Gemini products |
| AdsBot-Google | AdsBot-Google | Crawls landing pages for Google Ads quality |
| AdsBot-Google-Mobile | AdsBot-Google-Mobile | Mobile version of AdsBot for landing page quality |
| APIs-Google | APIs-Google | Crawls for Google APIs (Structured Data Testing Tool, etc.) |
| Mediapartners-Google | Mediapartners-Google | AdSense crawls for ad targeting |
| Google-InspectionTool | Google-InspectionTool | URL Inspection tool in Search Console |
| Googlebot-Mobile | Googlebot (iPhone) | Mobile-first indexing crawl (part of main Googlebot) |
Googlebot (Main Web Crawler)
Googlebot is Google's primary crawler for building and maintaining the web search index. It comes in two variants:
- Googlebot Desktop:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) - Googlebot Smartphone:
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) ... (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Since Google switched to mobile-first indexing in 2019, Googlebot primarily uses the smartphone user agent. The desktop user agent is still used but less frequently. When configuring robots.txt, a rule for User-agent: Googlebot applies to both variants.
Googlebot reads and respects your XML sitemap. Submit your sitemap in Google Search Console to ensure Googlebot discovers it, and reference it in robots.txt via Sitemap: https://yoursite.com/sitemap.xml for automatic discovery.
Google-Extended (AI Training Crawler)
Google-Extended is the user agent token for Google's AI training and Gemini product crawls. It was introduced in 2023 to give site owners a way to opt out of AI training independently of the main search index. Blocking Google-Extended will not affect your Google Search rankings — it only affects whether your content is used for Bard/Gemini AI improvement.
AdsBot-Google
AdsBot-Google crawls your Google Ads landing pages to evaluate ad quality. It does not respect the wildcard User-agent: * Disallow rules. If you want to block AdsBot, you must create an explicit rule for it. This is important: many site owners assume a global Disallow applies to all bots, but AdsBot requires its own explicit directive.
Googlebot-Image, Googlebot-News, Googlebot-Video
These specialist crawlers index content for specific Google products. If you want your images to appear in Google Images but not be indexed for web search (unusual, but possible), you can Allow Googlebot-Image while Disallowing Googlebot on specific paths. Similarly, sites with a News Publisher Center account can control Googlebot-News separately from the main crawler.
A rule for User-agent: Googlebot does NOT automatically apply to Googlebot-Image or Googlebot-News. These are separate user agents and require separate robots.txt rules if you need to control them independently.
Verifying Googlebot: IP Address Lookup
Because any bot can spoof a user agent string, the only reliable way to verify that a crawler is genuinely Googlebot is to perform a reverse DNS lookup on the IP address and confirm it resolves to a google.com or googlebot.com domain. Google publishes this verification process in its documentation.
Steps to verify Googlebot:
- Find the IP address of the suspicious request in your server access logs
- Run a reverse DNS lookup:
host [IP address]— the result should show a hostname ending in googlebot.com or google.com - Run a forward DNS lookup on that hostname to confirm it resolves back to the original IP
- If both steps confirm a google.com/googlebot.com hostname, the request is genuine Googlebot
Google also publishes its crawler IP ranges in a JSON file at https://developers.google.com/search/apis/ipranges/googlebot.json. You can use this for automated IP-based verification in your CDN or firewall rules.
robots.txt for Google Crawlers: Common Patterns
Google Crawlers vs. Third-Party AI Crawlers
Google's crawlers are distinct from the AI crawlers operated by OpenAI (GPTBot), Anthropic (ClaudeBot), and Perplexity (PerplexityBot). Each requires its own robots.txt rules. A Disallow for Googlebot does not block GPTBot, and vice versa. See the individual guides below for each third-party AI crawler.
Related Guides
- Googlebot Image: How Google Crawls and Indexes Images
- Googlebot News: Requirements and Optimization for Google News
- AdsBot Google: The Ad Crawler That Ignores robots.txt
- Googlebot Smartphone: Mobile-First Indexing Explained
- Googlebot IP Addresses: How to Verify Real Googlebot
- GPTBot: How to Control OpenAI's Web Crawler
- How to Block Bad Bots: robots.txt and CDN Methods
- robots.txt Complete Guide: Syntax, Testing, and Best Practices