2
votes

[Note to the wise: jump to last EDIT]

I have a very simple txt sitemap (named sitemap.txt) that looks like this:

http://myDomain.com
http://myDomain.com/about.html
http://myDomain.com/faq.html
http://myDomain.com/careers.html

When I load it up on webmaster tools I get:

Sitemap is HTML - Your Sitemap appears to be an HTML page. Please use a supported sitemap format instead

I tried a few alternatives (such as with or without www) but no luck.

Anyone any clue?

Any help appreciated!

EDIT:

I tried with an xml sitemap and getting the same error so it looks like the server is serving everything as HTML (as ceejayoz correctly suggests). Now the question is ... how do I get the appspot server to server text as plain text?

EDIT:

Ok - I got fed up and implemented a servlet to serve my sitemaps (I am now trying with both XML and TXT) explicitly as text/plain. Everything works fine if I manually invoke the servlet but still getting Sitemap is HTML. I don't know where to bang my head!

EDIT: I tried to verify content-type with a firefox plugin - everything seems to be coming up as expected (I am putting the actual URL so that people can have a look):

http://wokheisandbox.appspot.com/sitemaps/sitemap.txt --> Content-type: text/plain http://wokheisandbox.appspot.com/sitemaps/sitemap.xml --> Content-type: application/xml

With my servlet (setting text/plain explicitly): http://wokheisandbox.appspot.com/wokhei/serveSitemap?fileType=TXT --> Content-type: text/plain http://wokheisandbox.appspot.com/wokhei/serveSitemap?fileType=XML --> Content-type: text/plain

All I get from webmaster tool still is -->Sitemap is HTML.

EDIT:

I think I found out the reason --> I registered on google webmaster tools my site as http://mydomain.com but the app is hosted on appspot at http://myapp.appspot.com which is mapped to mydomain.com. If I register http://myapp.appspot.com everything works fine (sitemap validated).

This is good news but it's not ideal because I want mydomain.com to be indexed ... any idea about how to overcome?

5
You may consider posting this on Serverfault.com also/instead. - Travis
What's the real domain? How are you doing the domain forwarding? - Andrew Aylett
it's all done through google apps - real domain is www.wokhei.com - JohnIdol

5 Answers

5
votes

Sounds like your webserver is serving .txt files as text/html instead of text/plain.

For Apache, the following in a .htaccess file should fix it:

AddType text/plain .txt
1
votes

I found this thread discussing duplicate entries causing recent sitemap grief. I don't see this issue in your sitemap but you don't want any duplicates between entries. For example, make sure your sitemap doesn't contain BOTH of the following:

http://mydomain.com/ or http://www.mydomain.com/

AND

http://mydomain.com/index.html or http://www.mydomain.com/index.html

I think you posted your entire sitemap so, again, I don't think this is your problem exactly. You did mention you have tried various urls (with and without www.) If you are validating the sitemap via the Google WebMaster Tools it may take up to 20 minutes for correction to take affect. I hope it helps.

0
votes
<?xml version='1.0' encoding='utf-8' ?>
<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>
    <url>
        <loc>http://myDomain.com</loc>
    </url>
    <url>
        <loc>http://myDomain.com/about.html</loc>
    </url>
    <url>
        <loc>http://myDomain.com/faq.html</loc>
    </url>
    <url>
        <loc>http://myDomain.com/careers.html</loc>
    </url>
</urlset>

This way always works for me.

0
votes

Just in case if you will change your mind about non-xml sitemaps:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://www.test.com/</loc>
    <lastmod>2009-08-03T23:40:40+00:00</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://test/</loc>
    <lastmod>2009-08-03T23:59:08+00:00</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.6</priority>
  </url>
</urlset>
-1
votes

I'm fairly certain that you need to provide an XML formatted sitemap file (sitemap.xml). See here for a format example: http://en.wikipedia.org/wiki/Sitemaps.