CONTEXT
Assume my website contains a listing of shops organized across country, state, city and locality. Each shop has a dynamically generated web page. The total number of shops would eventually reach ~1.5 million. I use NDB to store shop data. I plan to use XML sitemap and submit them manually to the search engines. I use GAE Python.
PROBLEM
I want to maintain (generate and keep updated) url links in the sitemap for all shop pages.
Each unique url link for a shop page contains the following:
Country, State, City, Locality, Shop Name, Unique Index
eg, wwww.example.com/country--state--city--locality--shop_name--unique_index
Shops could be added, deleted or their data(eg, their name or city, etc) could be changed. I need to design a solution which helps me have the latest sitemap with updated links for all the shops. I intend to submit a new sitemap as soon as possible after any shop has been added/deleted/updated.
My Approaches
Approach 1
Generate the sitemap on the fly by querying the information from NDB models.
Cons of Approach 1
- NDB fetch limit of 10,000.
- Read operation free quota of 50,000.
- High consumtion of Frontend instance hours.
- Operation completion time of 60 seconds.
Approach 2
Generate and store the sitemap on my laptop using a program(say X, written by me in Perl/Python). Whenever a shop gets added / deleted / updated on my website, I would update a GCS(Google Cloud Storage) stored file with mnemonics like:
ADD < shop data like name, etc >
DELETE < shop data like name, etc >
UPDATE < shop data like name, etc >
I would download and feed this file to my local program X to generate the sitemap by updating the older stored sitemap file.
Cons of Approach 2
- GCS does not allow append of data to a file. Entire file needs to be written every time. So, with increase in the number of shops from 0 to 1.5 million, RAM usage and frontend instance hours consumtipn would peak.
- Operation completion time of 60 seconds.
Approach 3
The sitemap.xml file would contain:
Entries with URLs for other Sitemap Index files on Country basis. These country sitemap files would contain entries for URLs of State sitemap files. Similarly, state sitemap files would contain entries for URLs of City sitemap files. Similarly, city sitemap file would contain entries for URLs of locality sitemap files. Locality sitemap files would contain entries for URLs of shop pages.
Entries with URLs for all the static pages(like, FAQ, About Us, etc).
Pros of Approach 3
- When a shop page gets added / deleted / updated, I need to update that particular sitemap file.
Doubts with Approach 3
Can I store all the sitemap.xml files in GCS? Do you foresee any problem with that?
Is it allowed to have multiple levels of sitemap index files pointing to other sitemap index files?
I am not able to find a good solution. I have seen similar questions on SO and Nick's blog but in vain. I wish to remain within the free quota if possible. Please provide your suggestions.