4
votes

I have created a sitemap index for my Django site, since I have more than 50k URLs. Django says that it automatically paginates this for me, but I can't access the result in the URL.

Relevant code:

#urls.py
...
sitemaps = {
    'state': StateSitemap,
    'school': SchoolSitemap,
}

urlpatterns = patterns('',
    ....    
    url(r'^sitemap\.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}),
    url(r'^sitemap-(?P<section>.+).xml$','django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}),
)

The sitemap for "state" works just fine:

  • www.example.com/sitemap-state.xml

However for school, which has 100k+ entries, and thus should be automatically paginated by Django, all of the following 404:

  • www.example.com/sitemap-school.xml
  • www.example.com/sitemap-school1.xml
  • www.example.com/sitemap-school/1.xml

I know I'm misunderstanding how ".+" works as part of the sitemap index URL, but I'm stumped.

Which URL should I use to see the paginated sitemap result for "school" in my URL?

1
Looking at views code of django it seems pagination is handled by GET parameter named p and not + of regex. The .+ part of regex is for section name and not pagination. github.com/django/django/blob/master/django/contrib/sitemaps/… - sagarchalise
Yes, correct. Thank you for clarifying that and providing the link to the accurate documentation. - wsvincent

1 Answers

1
votes

You're missing a \ in your url.

url(r'^sitemap-(?P<section>.+).xml$','django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}),

should be

url(r'^sitemap-(?P<section>.+)\.xml$','django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}),