0
votes

I have a DNN site with over 20,000 pages. The Googlebot and Bingbot are consistently crawling my website.

When I look at my sitelog I can see that google and bing are crawling my site via the pageid (ex: www.url.com/Default.aspx?TabID=5000)

The bots are hitting my website every minute. When I add new page, I am expecting the bots to crawl the new added page, instead I see the bots re-crawling very old pages and will take a couple of hours before it recognizes the newly added page.

I have robot.txt file with over 10,000 entries that have the following defenitions:

Disallow:/Default.aspx?TabID=5000
Disallow:/Default.aspx?TabID=5001
Disallow:/Default.aspx?TabID=5002

and so forth.

So I am noticing a couple of issues:

1 - Googlebot and Bingbot are ignoring my disallows and are recrawling pages that I have defined in the robots.txt - how does the bot know to go back and recrawl old pages, using the TabID?

2 - I still notice that when I add a new page, both bots are busy crawling old content, and do not immediately read my new content, is there a way to force Google and Bing bots to always read newly added pages first?

thank you in advance for any suggestions.

1
What version of dotnetnuke are you on? Are you using any sitemap providers? Have you checked webmaster tools to Dr if the engines see your robots.txt fileChris Hammond
- using version 5. - Not using any sitemap providers. - I checked in webmastertools and it is reading the robots.txt file, the problem is that it only seems to allow me only around 100 disallow lines. <br> So I really do not know how else to tell the bots to not check old pages. I wanto to block anything below page 20,000 www.url.com/Default.aspx?TabID=20000, and I know I can not add 20k rows to my robots.txt. <br> Any suggestions?Cesar
Do you want to block all page ID URLs?unor
No I do not want to block all page ID Urls. When I publish new pages I am noticing that the bots are busy scanning old pages. I would like the robots to turn their attention to the new pages. Not sure if this is even possible.Cesar

1 Answers

0
votes

If you go to http://URL.com/sitemap.aspx check to see what pages are listed there.

I would highly recommend upgrading to DNN 7 as you can control which pages show up in the sitemap, that may help you control your indexing issues.

UPDATE: Under the Admin Menu, if you find a search engine sitemap page, you can set a minimum page priority to be included in the sitemap. Then for the pages you don't want to show up you can modify their priority in the page settings.