Updated April 2026

Googlebot IP Addresses: Verifying and Identifying Real Googlebot

Is Googlebot actually crawling your site? Check your sitemap health first.Check My Sitemap Free

Why Googlebot IP Addresses Matter

Any crawler on the internet can set its user agent string to claim it is Googlebot. This is trivially easy and happens constantly — scrapers, competitors, and malicious bots all impersonate Googlebot because sites often grant it special access or skip rate limiting for it.

Knowing how to verify a real Googlebot request from a fake one matters for several reasons. If you are allowlisting Googlebot in your firewall, WAF, or CDN, fake Googlebot can slip through those rules. If you block what you think is a bad bot but is actually real Googlebot, you hurt your indexing. Server log analysis for crawl budget purposes requires accurate identification of real Googlebot traffic.

Google publishes its official verification method and its current IP ranges, but neither alone is sufficient — you need both reverse DNS verification and cross-referencing against published IP ranges for reliable identification.

How to Verify Googlebot with Reverse DNS

Google's official verification method uses a two-step DNS lookup. First, perform a reverse DNS lookup on the IP address. If the result ends in googlebot.com or google.com, proceed to step two: perform a forward DNS lookup on that hostname and confirm it resolves back to the original IP. If both lookups succeed and the IP matches, the crawler is verified as real Googlebot.

Using the command line:

# Step 1: Reverse DNS lookup on the IP
host 66.249.66.1
# Expected output: 1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

# Step 2: Forward DNS lookup to confirm
host crawl-66-249-66-1.googlebot.com
# Expected output: crawl-66-249-66-1.googlebot.com has address 66.249.66.1

Using nslookup on Windows or if host is unavailable:

# Step 1: Reverse lookup
nslookup 66.249.66.1

# Step 2: Forward lookup
nslookup crawl-66-249-66-1.googlebot.com

A successful verification requires both lookups to agree. If the reverse lookup returns a non-Google hostname, or the forward lookup returns a different IP, the crawler is not real Googlebot regardless of what its user agent string says.

Google's Published IP Ranges

Google publishes a machine-readable JSON file of all IP ranges used by its crawlers at https://developers.google.com/static/search/apis/ipranges/googlebot.json. A separate file at https://www.gstatic.com/ipranges/goog.json covers the broader Google infrastructure.

These files contain CIDR blocks for both IPv4 and IPv6 ranges. An example entry looks like:

{
  "creationTime": "2024-01-01T00:00:00",
  "prefixes": [
    { "ipv4Prefix": "66.249.64.0/19" },
    { "ipv6Prefix": "2001:4860:4801::/48" }
  ]
}

Important caveats: these ranges change frequently as Google adds infrastructure. Checking IP ranges alone without DNS verification is not sufficient — any attacker can host their crawler on a Google Cloud IP. Use IP range matching as a supplementary check, not the primary verification method. If you are building automated Googlebot verification into your infrastructure, poll the JSON file regularly and treat the DNS round-trip as the authoritative check.

Fake Googlebot: Why Bad Actors Spoof It

Impersonating Googlebot is a well-known technique used by scrapers, content theft bots, vulnerability scanners, and DDoS reconnaissance tools. The motivation is simple: many sites treat Googlebot as a trusted crawler and disable rate limiting, CAPTCHAs, and other bot protections for it.

Setting a user agent string to Googlebot requires no technical sophistication — it is a single HTTP header field. Any HTTP client can do it in one line. This means that a large fraction of traffic claiming to be Googlebot in your server logs is not actually from Google at all, particularly for smaller or newer sites.

Fake Googlebot traffic causes several problems: it inflates your crawl budget estimates in log analysis, it can trigger server load as if you were being crawled heavily, and if you have pages conditionally served to bots, fake Googlebot can access that content.

How Fake Googlebot Bypasses robots.txt

robots.txt rules are applied based on the user agent string in the HTTP request. If your robots.txt has an Allow: / rule for Googlebot, any crawler claiming to be Googlebot will be granted that access — robots.txt has no mechanism to verify the crawler's identity.

# This rule applies to ANY crawler claiming to be Googlebot
User-agent: Googlebot
Allow: /secret-content/

# Legitimate robots.txt usage — real Googlebot obeys this
User-agent: Googlebot
Disallow: /admin/

Real Googlebot respects robots.txt by choice as part of Google's policies. It is not technically enforced at the crawl level by any protocol. Fake Googlebot typically ignores robots.txt entirely since its operators have no incentive to comply. This means that if you have content you want to hide from bots via robots.txt, you cannot rely on the Googlebot user agent exemption to protect it.

The correct approach for protecting sensitive content is authentication and authorization at the application layer — not robots.txt rules.

Server Log Analysis: Identifying Googlebot Traffic

Raw server logs contain every request including the IP address and user agent string. To identify real Googlebot traffic for crawl budget analysis, you need to cross-reference the Googlebot user agent with the two-step DNS verification for each unique IP that claimed to be Googlebot.

In practice, you can simplify this for log analysis by first extracting all unique IPs that reported a Googlebot user agent, then batch-verifying them with the reverse/forward DNS method. Any IP that fails verification should be excluded from your Googlebot-specific analysis.

A simple approach using common Unix tools to extract claimed Googlebot requests:

# Extract unique IPs claiming to be Googlebot from access log
grep -i "googlebot" /var/log/nginx/access.log | awk '{print $1}' | sort -u

# Then verify each IP with reverse DNS:
for ip in $(grep -i "googlebot" /var/log/nginx/access.log | awk '{print $1}' | sort -u); do
  hostname=$(host $ip | grep "googlebot.com|google.com" | awk '{print $NF}')
  if [ -n "$hostname" ]; then
    echo "VERIFIED: $ip -> $hostname"
  else
    echo "FAKE: $ip"
  fi
done

Many professional log analysis tools and SEO platforms (Screaming Frog Log File Analyser, JetOctopus, Botify) perform this verification automatically when you import your logs.

Blocking Fake Googlebot at the Server or CDN Level

The most effective approach is to use your WAF or CDN to implement Googlebot verification logic. At the edge, you can match requests claiming to be Googlebot, verify their IP against Google's published CIDR ranges, and challenge or block those that do not match.

For nginx, you can combine IP allowlisting with user agent matching:

# nginx: block requests claiming to be Googlebot from non-Google IPs
geo $is_google_ip {
  default 0;
  66.249.64.0/19 1;
  66.249.80.0/20 1;
  # Add full range from googlebot.json
}

map $http_user_agent $is_googlebot_ua {
  default 0;
  ~*googlebot 1;
}

# In server block:
if ($is_googlebot_ua = 1) {
  if ($is_google_ip = 0) {
    return 403;
  }
}

Cloudflare, Fastly, and AWS WAF all support IP range matching rules that you can combine with user agent conditions. The key is keeping the IP range list up to date by periodically refreshing it from Google's published JSON files. Some CDNs offer managed bot protection that handles Googlebot verification automatically.

Is Googlebot Actually Crawling Your Pages?

Free sitemap analysis in 60 seconds

Check My Sitemap Free