Fix Your Sitemap for Gatsby
Gatsby emits sitemap-index.xml at build via gatsby-plugin-sitemap, but sitemap quality depends entirely on the GraphQL query, resolveSiteUrl, and exclude patterns you set in gatsby-config.js.
Gatsby builds its sitemap from the allSitePage GraphQL node, so any page not in allSitePage won't land in the sitemap. Any page you didn't want in there, but that got created by a source plugin, will. This guide covers what breaks at the node level and how to fix it.
Audited a Gatsby + Contentful site earlier this year with 2,400 published entries. The sitemap had 7,800 URLs. The culprit: draft entries from Contentful's preview API plus paginated tag archives the team had forgotten existed. Fixing the GraphQL filter dropped the sitemap to 2,390 URLs - closer to what actually deserves indexing.
gatsby-plugin-sitemap vs gatsby-plugin-advanced-sitemap
gatsby-plugin-sitemap is the official one. It produces one index plus sharded files, respects your siteUrl, and accepts query, resolvePages, and serialize hooks. Good default.
gatsby-plugin-advanced-sitemap is the Ghost fork. It splits into per-type sitemaps automatically (pages, posts, authors, tags) and includes image entries when your source plugin exposes them. Use it if you want typed sitemaps without writing the serialize logic yourself. The trade-off: slower release cadence and a query shape that assumes Ghost/Contentful conventions.
Common Gatsby Sitemap Issues
siteUrlmissing fromsiteMetadata, producing relative URLs insitemap-0.xml- Dev-only routes like
/dev-404-page/and/offline-plugin-app-shell-fallback/slipping into production - Paginated archive nodes (
/blog/2,/blog/3) included without review - Contentful or Sanity drafts leaking in because no filter on
node_localeor publish status - Client-only routes (defined with
matchPath) missing entirely - they have no page node at build - i18n locales generating
/enand/dewith nohreflangalternates - Trailing-slash mismatch between sitemap output and actual page URLs, triggering 301s on every crawl
- The plugin running at
onPostBuildbut your CI uploadingpublic/before the build finishes
Working gatsby-config.js example
module.exports = {
siteMetadata: {
siteUrl: 'https://example.com',
},
plugins: [
{
resolve: 'gatsby-plugin-sitemap',
options: {
output: '/sitemap-index.xml',
excludes: [
'/dev-404-page',
'/offline-plugin-app-shell-fallback',
'/404',
'/404.html',
'/preview/*',
'/admin/*',
],
query: `{
allSitePage(filter: {path: {regex: "/^(?!/draft)/"}}) {
nodes { path }
}
allContentfulPost(filter: {node_locale: {eq: "en-US"}}) {
nodes { slug updatedAt }
}
}`,
resolveSiteUrl: () => 'https://example.com',
resolvePages: ({ allSitePage, allContentfulPost }) => {
const postMap = Object.fromEntries(
allContentfulPost.nodes.map(p => [`/blog/${p.slug}/`, p.updatedAt])
);
return allSitePage.nodes.map(page => ({
...page,
lastmod: postMap[page.path] || undefined,
}));
},
serialize: ({ path, lastmod }) => ({
url: path,
lastmod,
changefreq: 'weekly',
}),
},
},
],
};Handling i18n locales
gatsby-plugin-react-i18next (or gatsby-plugin-intl) creates one page node per locale prefix. The default sitemap lists all of them but without hreflang alternates, which means Google treats them as separate pages with duplicate content.
Use the serialize hook to emit an xhtml:link alternate array per URL, or generate one sitemap file per locale and reference them all from sitemap-index.xml. The per-locale approach is easier to debug: when French pages drop out of the index, you see which file they came from.
Client-only routes
Routes you created in gatsby-node.js with createPage({ matchPath: '/app/*' }) don't show up in allSitePage because they render fully client-side. If these routes need indexing (rare for /app/, common for /profile/[id]), use resolvePages to inject them manually from your data source. Most Gatsby /app/ dashboards should not be in the sitemap anyway.
Deployment pipeline notes
The sitemap runs in onPostBuild, which fires after HTML generation. If your CI pipeline starts the deploy before the build completes, you can ship HTML without the sitemap. Confirm your deploy step waits for gatsby build to exit cleanly, and include public/sitemap-*.xml in your artifact upload. On Gatsby Cloud and Netlify this is automatic; on custom pipelines it is the #1 reason "the sitemap didn't deploy".
Step-by-Step Fix Guide
- Run
npm install gatsby-plugin-sitemapand add it to the plugins array - Set
siteMetadata.siteUrlto your production origin - the plugin prepends it to every path - Add
excludesfor dev routes, 404, preview paths, and staging-only patterns - Override
queryto filter source plugin nodes by publish status and locale - Use
resolvePagesto merge client-only routes or inject reallastmodfrom your CMS - Run
gatsby buildand inspectpublic/sitemap-index.xmlplus each shard - Verify with
curl https://yoursite.com/sitemap-index.xmlthat the file is deployed and returns 200 - Submit
sitemap-index.xmlto Google Search Console and watch the per-shard coverage