0
votes

When create Properties class in Java, it requires the input stream must be encoded in ISO-8859-1. But in practice, I use UTF-8 to convert non-Latin properties files by native2ascii tool rather than ISO-8859-1.

  • According to JDK doc, input stream should be encoded ISO-8859-1. That's to say, the source file is encoding in ISO-8859-1.
  • According to decoding and encoding should using the same ISO-8859-1, Properties class in Java should decode using ISO-8859-1.
  • According to testing, actually we use UTF-8 as encoding option in native2ascii rather than ISO-8859-1. Why?

Test as follow:

  • Create test.properties file which contains : "key=Ü"
  • Generated ISO-8859-1 property file: key=\u00c3\u009c

      native2ascii -encoding ISO-8859-1 test.properties iso88591.propertie: 
    
  • Generated UTF-8 property file: key=\u00dc

      native2ascii -encoding UTF-8 test.properties utf8.properties 
    
  • Create Properties to load the two generated property files:

    Properties p = new Properties();
    //InputStream inStream = new FileInputStream("src/test/java/com/active/translation/iso88591.properties");
    InputStream inStream = new FileInputStream("src/test/java/com/active/translation/utf8.properties");
    p.load(inStream);
    
    System.out.println(p.getProperty("key"));
    
  • iso88591.propertie result is: Ã

  • utf8.properties result is: Ü

Answer:

That -encoding needs to match the actual encoding used in your source file. From the looks of it, that is UTF-8. – Thilo Apr 3 at 2:52

1
"According to JDK doc," which JDK documentation, specifically? - Matt Ball
That -encoding needs to match the actual encoding used in your source file. From the looks of it, that is UTF-8. - Thilo
In UNICODE that character is U+00DC. In UTF-8 encoding, it would be 0xC39C. When that file was read with ISO-8859-1 encoding, it read the UTF-8 encoded character as two ISO-8859-1 encoded characters: 0xC3 0x09c. As Thilo suggested, make the --encoding parameter match the actual encoding of the file. - Dave Newman
JDK doc: docs.oracle.com/javase/6/docs/api/java/util/Properties.html. " except the input/output stream is encoded in ISO 8859-1 character encoding" - Ian Jiang
As Thilo suggested, the -encoding option is what the actual encoding of source file. When saving test.properties with "ISO-8859-1" encoding, then iso88591.propertie: key=\u00dc and value via getProperty("key") is correct. 'native2ascii -encoding ISO-8859-1 test.properties iso88591.propertie' - Ian Jiang

1 Answers

0
votes

That -encoding needs to match the actual encoding used in your source file. From the looks of it, that is UTF-8. – Thilo Apr 3 at 2:52