Fix Your Sitemap for Django
Django's django.contrib.sitemaps framework gives you a Sitemap class per model, but incorrect querysets, missing lastmod, and no index view mean production sites routinely ship sitemaps that miss pages or hit the 50k URL limit.
Django's sitemap framework is solid but opinionated. It assumes you want one Sitemap class per model, uses django.contrib.sites to stitch absolute URLs, and renders the XML on every request unless you cache it. On a small site that's fine. On a site with 300k articles and a updated_at field without an index, the first crawler hit brings the box to its knees.
Debugged a Django news site earlier this year. 180k articles, sitemap view taking 38 seconds to render because lastmod was calling obj.updated_at inside a loop that pulled the full object each iteration. Adding .only('slug', 'updated_at') plus a composite index on (is_published, published_at) dropped it to 1.2 seconds. After that, caching for 24 hours made it a non-issue.
Common Django Sitemap Issues
- Querysets returning all records including
is_published=Falsedrafts and scheduled posts - Sitemap view timing out because the queryset has no
select_relatedand triggers N+1 queries - Sites framework misconfigured, causing
location()to return paths without a domain - Single
sitemap.xmlexceeding 50,000 URLs - Google rejects oversized feeds - Missing
lastmodbecause the model has noupdated_atfield - Non-canonical URL patterns leaking in from legacy routes
- Multi-tenant sites serving one tenant's URLs to another because of global caching
- Sitemap view running behind auth middleware that requires login (subtle, but happens)
Working sitemap class
# sitemaps.py
from django.contrib.sitemaps import Sitemap
from django.utils import timezone
from blog.models import Article
class ArticleSitemap(Sitemap):
changefreq = 'weekly'
priority = 0.7
protocol = 'https'
limit = 40000 # leave headroom under the 50k cap
def items(self):
return (Article.objects
.filter(is_published=True,
published_at__lte=timezone.now())
.only('slug', 'updated_at')
.order_by('-updated_at'))
def lastmod(self, obj):
return obj.updated_at
def location(self, obj):
return f'/blog/{obj.slug}/'
# urls.py
from django.contrib.sitemaps.views import sitemap, index
from django.views.decorators.cache import cache_page
from .sitemaps import ArticleSitemap, StaticSitemap
sitemaps = {
'articles': ArticleSitemap,
'static': StaticSitemap,
}
urlpatterns = [
path('sitemap.xml',
cache_page(60 * 60 * 24)(index),
{'sitemaps': sitemaps}),
path('sitemap-<section>.xml',
cache_page(60 * 60 * 24)(sitemap),
{'sitemaps': sitemaps},
name='django.contrib.sitemaps.views.sitemap'),
]Sites framework setup
Django's sitemap framework uses Site.objects.get_current() to prepend a domain. If you skip this step, URLs come out as example.com/blog/post/ (missing scheme) or just paths. Setup:
- Add
'django.contrib.sites'toINSTALLED_APPS - Set
SITE_ID = 1in settings - Run
python manage.py migratethen edit the Site row at/admin/sites/site/ - Set the domain to
example.com(no scheme, no trailing slash) - Use
protocol = 'https'on each Sitemap class so URLs render ashttps://
Skipping the sites framework is the #1 reason a Django sitemap "works locally but breaks in prod".
Large catalogs and pagination
Django's Sitemap class paginates automatically at limit (default 50,000). The index view emits sitemap-articles-1.xml, sitemap-articles-2.xml, etc. Set limit = 40000 to stay under the hard cap with headroom. For catalogs above 1M URLs, consider pre-generating sitemap files to disk or S3 via a management command on a cron - rendering on demand will not scale.
Multi-tenant setup
If you use django-tenants or a subdomain router, the sitemap view needs to run inside the tenant's schema. Wrap the view with your tenant middleware (or call schema_context(tenant) explicitly), and cache by request.tenant.schema_name rather than globally. Otherwise tenant A will see tenant B's sitemap from cache.
Caching and performance
Cache the sitemap view with @cache_page(60 * 60 * 24) - Google rarely hits it more than once per day, and rebuilding hourly wastes DB capacity. If your content updates during the day, add a post_save signal that invalidates the cache key. Combine that with .only() to restrict the queryset to the fields you actually use and add an index on the timestamp column you sort by.
Step-by-Step Fix Guide
- Add
django.contrib.sitemapsanddjango.contrib.sitestoINSTALLED_APPS, setSITE_ID, update the Site row to your production domain - Create a Sitemap subclass per model with filters in
items()for published + past-dated records - Add
updated_at = models.DateTimeField(auto_now=True)and implementlastmod() - Use the
indexview forsitemap.xmlandsitemapfor per-section files - Set
limit = 40000on large sitemaps; index the timestamp field - Cache with
@cache_page(60 * 60 * 24)and use.only()/.select_related()to avoid N+1 - Verify with
curl -I https://yourdomain.com/sitemap.xmlreturns 200 and no auth redirect - Submit to Google Search Console and monitor the per-section coverage reports