Invalid XML Syntax in Your Sitemap
XML is strict: a single unescaped ampersand, an unclosed tag, or a stray byte-order mark at the start of the file is enough to make Googlebot reject your entire sitemap. When this happens, none of the URLs inside are processed - they're all invisible to search engines until the syntax error is fixed.
What is this error?
An invalid XML syntax error occurs when your sitemap.xml fails XML well-formedness rules. Common cases include: unescaped special characters in URLs (&, <, >, ", '), unclosed tags (<loc> without </loc>), missing xmlns namespace declaration, incorrect declaration order, or a UTF-8 byte-order mark (BOM) before the XML declaration. Search Console reports "Sitemap could not be read."
Why does it happen?
The most common cause is URLs containing query strings with & characters that weren't escaped to &. Other causes: CMS plugins that concatenate strings instead of using a real XML library, template engines that emit unclosed tags when a field is null, file editors that save with a BOM, and copy-paste edits that break tag matching. Files written in Windows sometimes mix CRLF and LF line endings in ways that break parsers too.
Why does it hurt SEO?
It's the worst possible sitemap error: total failure. Google's XML parser stops at the first syntax error and treats the file as unreadable, meaning zero URLs in the sitemap contribute to discovery or lastmod signals. A single malformed ampersand in URL #2,847 out of 50,000 can nullify the other 49,999. Every day the error persists, fresh content accumulates without any discovery boost.
How to detect it
Run your sitemap through xmllint --noout sitemap.xml on the command line - it prints exact line and column numbers of any parse error. Google Search Console's sitemap report also shows "Sitemap could not be read" with a generic error message. Sitemap Fixer runs a full XML parse and reports the first error location plus any schema violations (missing xmlns, wrong element nesting).
How to fix it
1. Run `xmllint --noout sitemap.xml` to get the exact line number of the first error. 2. Escape all special characters: replace & with &, " with " in URL attributes. 3. Confirm the file starts with <?xml version="1.0" encoding="UTF-8"?> with no BOM before it. 4. Verify the root <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> is present. 5. Switch your generator from string concatenation to a proper XML library (lxml, xml.etree, xmlbuilder2). 6. Re-validate, upload, and resubmit the sitemap in Search Console.
Real-world example
A media site's sitemap broke after they added query-string-based article URLs like /article?id=123&author=jane. The ampersand wasn't escaped. Their sitemap with 12,000 URLs dropped from "22 URLs discovered" (their home page links) back to 22 indexed pages overall. After escaping to &, coverage recovered to 11,800 within 2 weeks.
Common mistakes
- Building sitemaps with string concatenation instead of a real XML library
- Saving the sitemap in a text editor that adds a UTF-8 BOM
- Forgetting that XML is case-sensitive -
<Url>and<url>are different tags