12
votes

A simple HTML file:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<form method="POST" action="test.jsp" accept-charset="utf-8" method="post" enctype="application/x-www-form-urlencoded" >
    <input type="text" name="P"/>
    <input type="submit" value="subMit"/>
</form>
</body>
</html>

The HTML file is served by the server using header Content-Type:text/html; charset=utf-8. Everything says: "dear browser when you post this form, please post it utf-8 encoded". The browser actually does this. Every value entered in the input field will be UTF-8 encoded. BUT the browser wont tell this to the server! The HTTP header of the post request will contain a Content-Type:application/x-www-form-urlencoded field but the charset will be omitted (tested with FF3.6 and IE8).

The problem is the application server I use (Tomcat6) expects the charset in the Content-Type header (as stated in RFC2388). Like this: Content-Type:application/x-www-form-urlencoded;charset=utf-8. If the charset is omitted it will assume ISO-8859-1 which is not the charset used for encoding. The result is broken data.

Does some one have a clue how to force the current browsers to append the charset to the Content-Type header?

1
I am falling in exactly the same problem, and I've asked FF on google groups for a way to solve this problem groups.google.com/group/mozilla.dev.platform/browse_thread/…Muhammad Hewedy

1 Answers

11
votes

Does some one have a clue how to force the current browsers to append the charset to the Content-Type header?

No, no browser has ever supplied a charset parameter with the application/x-www-form-urlencoded media type. What's more, the HTML spec which defines that type, does not propose a charset parameter, so the server can't reasonably expect to get one.

(HTML4 does expect a charset for the subparts of a multipart/form-data submission, but even in that case no browser actually complies.)

accept-charset="utf-8"

accept-charset is broken in IE, and shouldn't be used. It won't make a difference either way for forms in pages served as UTF-8, but in other cases it can end up with inconsistent results.

No, with forms you just have to serve the page they're in as UTF-8, and the results should come back as UTF-8 (with no identifying marks to tell you that (except potentially for the _charset_ hack, but Tomcat doesn't support that).

So you have to tell the Servlet container what encoding to use for parameters if you don't want it to fall back to its default (which is usually wrong). In a limited set of circumstances you may be able to call ServletRequest.setCharacterEncoding() to do this, but this tends to be brittle, and doesn't work at all for parameters taken from the query string. There's not a standardised Servlet-level fix for this, sadly. For Tomcat you usually have to muck about with the server.xml instead of being able to fix it in the app.