What are the correct content-types for XML, HTML and XHTML documents?
I need to write a simple crawler that only fetches these kinds of files.
Nowadays http://example.net/index.html can serve for example a JPEG file due to mod_rewrite, so I need to check the content-type from response header and compare it with a list of allowed content-types.
Where can I get such a list from?