4
votes

The RFC-2616 states in 3.7.1:

When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.

This is why I usually use e.g. text/plain; charset=utf-8 as Content-Type Header.

What about MediaTypes of type application?

I often see und use headers like Content-Type: application/xml; charset=UTF-8. RESTeasy 2.3.7 then forces the client to also send the charset parameter in the Accept header. Otherwise it will answer with a 406. RESTeasy 3.0.6 seems to be quite more tolearant here so I'm unsure what's the best practice here.

1

1 Answers

7
votes

RFC 2616 was obsoleted in June 2014 by a set of RFCs, where the one containing the general HTTP specifications is RFC 7213. Please use the RFC editor to check the current status of RFCs.

RFC 7213 explicitly says (in Appendix B):

The default charset of ISO-8859-1 for text media types has been
removed; the default is now whatever the media type definition says.

On the other hand, RFC 6657, while anticipating such changes, declares:

The default "charset" parameter value for "text/plain" is unchanged from [RFC2046] and remains as "US-ASCII".

Thus, if your data is not ASCII (= US-ASCII), you should keep declaring the charset parameter explicitly.

The XML specification, clause 4.3.3, specifies:

In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 MUST begin with a text declaration [...] containing an encoding declaration

So for XML transmitted over HTTP, irrespective of content type, the encoding MUST be explicitly set either in an HTTP header or in an encoding declaration, e.g. <?xml encoding='UTF-8'?>.

For application types in general, type-specific rules may apply. Character encoding is irrelevant to most application types, as the types define their own encoding schemes, including the encoding of any embedded character data.