1
votes

I have an html form :

<p> Select beer characteristics </p>
<p> 
  Color: 
  <select name="color" size="1">
    <option value="light"> light </option>
    <option value="amber"> amber </option>
    <option value="brown"> brown </option>
    <option value="dark"> dark </option>
  </select>
  <br><br> 
</p>
<input type = "submit" value="submit">
  • for the input parameter name = "color", there are four options: light, amber, brown, dark
  • based on which value is chosen, a result page is shown
  • however, when I select an option, there are some junk characters being added in front and at the end of the string for the option value
  • on debugging, this is the value read (on selecting "amber") when I read using request.getParameter("color") look like: â€amberâ€
  • this is causing a problem in the back end where i want to do a string match with the input parameter

Any suggestions?

4

4 Answers

2
votes

This is result of wrong encoding in browser, which is most probably not set on response. You can try to use:

response.setContentType("text/html; charset=UTF-8");
2
votes

I am quite sure that this is related to character encoding or URL encoding mismatches.

First of all, make sure to specify a charset

<form action="..." method="..." accept-charset="UTF-8">
    <select ...> ... </select>
</form>

If the client sends all your stuff correctly with a good encoding (UTF-8), you have to configure your server side to read the data as well.

I don't know what you're using, but one method is:

URLDecoder.decode(formParams, "UTF-8");

To be sure, you can add an encoding to your HTML file as well:

<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    ...
</head>

Edit: make sure to send and receive all the stuff correctly as well.

Sending HTML file from Server:

1) Make sure to set this:
Content-Type: text/html; charset=UTF-8

If you're sending a file, make sure to save your file using the UTF-8 encoding. If your HTML is a generated String, use:

PrintWriter writer = new PrintWriter(new OutputStreamWriter(httpOutputStream, "UTF-8"));
writer.print(string);
...

The URL from the request is received in US-ASCII encoding:

String urlEncodedString = new String(receivedBytes, "UTF-8");
String decoded = URLDecoder.decode(urlEncodedString, "UTF-8");
2
votes

You're using the wrong kind of quote characters in your HTML code.

What you probably have is something like this:

<option value=“light“>

Unless you use the correct double quotes (") or single quote (') to enclose an attribute, the browser will interpret the value as “light“ and not light, and that's what it sends to the server.

(Note that this wouldn't be valid in XHTML, where only quoted attributes are allowed, but in plain HTML specifying attributes in a <foo bar=value> format works.)

The strange output can be explained by the fact that your browser and your server use different encodings: one uses ISO-8859-1 and the other UTF-8. The UTF-8 sequence for the left double quotation mark character is 0xe2 0x80 0x9c, which when read with ISO-8859-1, gives exactly the two characters you mention. (The third one falls in an unused block and is dropped silently.)

This is a separate problem that needs to be remedied too, see the other answers for tips to deal with it.

0
votes

I faced the same problem while converting xhtml to PDF using wkhtmltopdf tool.

Adding <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> in my HTML template resolved the issue.