0
votes

I want my application to support ISO-8859-1 fully in jetty server. But I am Unable to change the default character encoding to ISO-8859-1. Where do i need to set the encoding/charsets?

This is for jetty-distribution-9.4.12, running a struts web application. I have tried modifying the webdefault.xml for encoding mappings. But somehow it fails to take UTF-8 for encoding.

I am seeing an issue when giving a name to an XML resource with japanese chars(私のユーザー). jetty server always fails in taking this name to my resource. when I check in the request I see that the content type is UTF-8 and HTTP 1.1 spec. I want my server to support in taking my resource name as 私のユーザー. In order to make this happen, I wanted to add that compatibility to the server. However, with the little knowledge I have, done some research tried to do some configurations in the server but nothing seems to work.

Trial 1

Changing the web-default.xml with locale-encoding

<locale-encoding-mapping> <locale>en</locale> <encoding>ISO-8859-1</encoding> </locale-encoding-mapping>

Trial 2

adding the encoding property to the JAVA_OPTIONS in jetty.sh file

JAVA_OPTIONS+=("-Dfile.encoding=UTF-8")

referred links

Jetty Character encoding issue

Jetty 9, character encoding UTF-8

1
So you want a subset of UTF-8 only? Latin-1 only. No international character support? You want to intentionally break your support for Chrome, Firefox, and Microsoft Edge? You want to use a old / obsolete HTTP spec? That's what using ISO-8859-1 means - stackoverflow.com/questions/7048745/…Joakim Erdfelt
Japanese characters are never ISO-8859-1. This updated question is in conflict with itself. Also, are you asking about filenames? or content? (two completely different concepts.) Your Trials have zero impact on filenames, but your question seems to indicate that you want filename to have the japanese character sets. If filenames it is imperative that you understand where your filenames are stored (eg: Windows, vs Linux, vs OSX, vs in a JAR)Joakim Erdfelt

1 Answers

0
votes

Jetty uses the current HTTP/1.1 specs (yep, all of these specs talk about current HTTP/1.1 specific behavior)

I think the most relevant spec to your question is from RFC7231 - Appendix B: Updates from RFC2616

   The default charset of ISO-8859-1 for text media types has been
   removed; the default is now whatever the media type definition says.
   Likewise, special treatment of ISO-8859-1 has been removed from the
   Accept-Charset header field.  (Section 3.1.1.3 and Section 5.3.3)

The idea of ISO-8859-1 being the default charset has long ago been deprecated, the only place you'll find ISO-8859-1 indicated as a default charset is in old specs that have now been labelled as "obsolete" (such as RFC2616).

Timeline:

  1. The older HTTP/1.1 spec, RFC2616, was released in 1999.
  2. The faults in RFC2616 were identified and a revised spec started being discussed in 2006.
  3. The updated specs RFC7230 thru RFC7235 were release in June 2014.
  4. All of the major browser vendors (Chrome, Firefox, Edge, Safari, etc..) updated that year to support RFC7230 and related specs.
  5. Over the years since, the major browser have started to drop RFC2616 concepts and support, removing behaviors, and even quietly dropping features that are from other obsolete specs (eg: older Set-Cookie header syntax now result in a no-op on the browser side, with the cookie being dropped).

Today (Sept 2019):

  • The HTTP 1.1 protocol has a default character encoding of UTF-8.
  • The HTTP 1.1 document default character encoding is UTF-8.
  • The HTTP 2 protocol has a default character encoding of UTF-8.
  • The HTTP 2 document default character encoding is UTF-8.

What all Web Developers today are responsible for:

  • You MUST limit your HTTP 1.1 protocol usages (headers names, header values) to US-ASCII.
  • Header names should follow HTTP 1.1 token rules. (this is a subset of US-ASCII)
  • Header values that contain a character outside of US-ASCII 1, MUST be encoded first in UTF-8 and then the hex values percent-encoded for representation in the header value.
  • If you intend to send a ISO-8859-1 document as a response body, then you MUST indicate as such in the HTTP Response Content-Type header the mime-type and charset. (eg: Content-Type: text/html; charset=ISO-8859-1)

But seeing as you didn't indicate where in the HTTP exchange you are wanting to set this default character encoding, it's hard to express a detailed answer/solution to your issue. (eg: it could be a problem with your encoding of application/x-www-form-urlencoded request body content and its interaction with the Servlet spec? which can be fixed with an additional field in your HTML5 form btw)

1: This might seem harsh, but if you check RFC 7230: 3.2.4 Field Parsing you'll see that the existence of characters in the header fields of HTTP outside of US-ASCII will at best be dropped, or at worst be interpreted to be a obs-fold or obs-text character rendering the entire request as bad resulting in a (400 Bad Request).

   Historically, HTTP has allowed field content with text in the
   ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
   through use of [RFC2047] encoding.  In practice, most HTTP header
   field values use only a subset of the US-ASCII charset [USASCII].
   Newly defined header fields SHOULD limit their field values to
   US-ASCII octets.  A recipient SHOULD treat other octets in field
   content (obs-text) as opaque data.