International Duplicate Content: Hreflang Fix Guide

Running a website across multiple countries and languages creates a structural duplicate content risk. When Google finds near-identical pages at /en-us/ and /en-gb/, it doesn't know which to show to which audience — unless you implement hreflang correctly. This guide covers every aspect of the fix.

What International Duplicate Content Is

International duplicate content occurs when multiple URL versions of a page serve the same or substantially similar text but target different geographic audiences or language groups. Common patterns include separate country subfolders (/us/, /uk/, /au/) with only minor localization differences like currency symbols or phone numbers, and separate language pages (/en/, /en-gb/) that share the same copy. From Google's perspective, these look like duplicates competing for the same queries.

Why Google Gets Confused by Geo-Duplicates

Google's duplicate detection algorithms look at textual similarity across pages. Two pages that are 95% identical but differ in a handful of localized terms will frequently be treated as near-duplicates. Without explicit signals, Google picks one version to index and may choose the wrong one — ranking your US page for UK users, or consolidating link equity to a version that receives no internal links from your geo-targeted navigation. The result is suppressed rankings in the markets you're trying to serve.

Hreflang: The Correct Fix

Hreflang is a link attribute that tells Google the language and optional geographic target of a page, and points to the equivalent versions in other locales. The format is hreflang="en-GB" for language-country pairs, or hreflang="en" for language-only targeting. Unlike canonical tags, hreflang is a bilateral signal: both pages must reference each other. If only one side of the relationship is declared, Google treats the annotation as broken and may ignore it entirely.

Implementing Hreflang in XML Sitemaps

For large sites, adding hreflang to every HTML page's <head> is error-prone and hard to maintain. The XML sitemap method is more scalable. In your sitemap, each <url> block includes <xhtml:link> elements — one for every locale variant including the page itself. The sitemap must declare the xmlns:xhtml namespace. This approach centralizes locale declarations so a CMS or static site generator can produce them programmatically, reducing the risk of annotation drift over time.

Self-Referencing Hreflang Is Required

One of the most common hreflang implementation errors is omitting the self-referencing annotation. Every page must include an hreflang tag pointing to itself, using its own locale code. If /en-us/page lists annotations for /en-gb/page and /en-au/page but not itself, Google considers the set incomplete. The self-reference confirms that each URL knows its own locale, which is a prerequisite for the full annotation set to be trusted.

Common Hreflang Mistakes

Beyond missing self-references, the most frequent errors include: using invalid locale codes (ISO 639-1 for language, ISO 3166-1 alpha-2 for region — not custom abbreviations), pointing hreflang URLs to redirected or non-200 pages, inconsistent annotations across the page set (page A references B but B does not reference A), and including hreflang annotations for pages that return 404 or are blocked by robots.txt. Each of these causes Google to distrust and potentially discard the entire hreflang cluster.

When to Use Canonical Instead of Hreflang

If two locale variants are truly identical — same language, no meaningful localization — canonical tags rather than hreflang are appropriate. Use a canonical from the secondary version to the primary. Hreflang is for pages that are intentionally different for different audiences; canonical is for pages that are unintentionally the same. Mixing both signals on the same page creates conflicting instructions and should be avoided unless you explicitly intend to consolidate one version while still surfacing it in a specific locale.

Geo-Targeting in Google Search Console

Google Search Console allows you to set a geographic target for subfolders and subdomains via the International Targeting report. For country-specific subfolders like /uk/, set the target to United Kingdom. This reinforces hreflang signals and helps Google route queries to the correct version faster. Note that GSC geo-targeting is a site-level override — it does not replace hreflang for page-level control, and it has no effect on language targeting, only country targeting.

Diagnosing International Indexing Problems

Use the URL Inspection tool in GSC to check which canonical Google has selected for each locale variant. If Google consistently chooses the wrong version, your hreflang annotations likely have errors. The Hreflang report in the Coverage section surfaces detected annotation errors. Third-party tools like Ahrefs Site Audit and Screaming Frog have dedicated hreflang validators that crawl your entire site, detect missing return links, invalid locale codes, and redirected hreflang targets — making systematic diagnosis practical even for sites with thousands of locale variants.

Audit Your International Sitemap

Sitemap Fixer checks your hreflang annotations for broken return links, invalid locale codes, and non-200 target URLs across all your locale variants.

Analyze Your Sitemap Free

Related Guides