I have a JavaEE project, in which I use message properties files. The encoding of those file is set to UTF-8. In the file I use the german umlauts like ä
, ö
, ü
. The problem is, sometimes those characters are replaced with unicode like \uFFFD\uFFFD
, but not for every character. Now, I have a case where ä
and ü
are both replaced with \uFFFD\uFFFD
, but not for every occurring of ä
and ü
.
The Git diff shows me something like this:
mail.adresses=E-Mail hinzufügen:
-mail.adresses.multiple=E-Mails durch Kommata getrennt hinzufügen.
+mail.adresses.multiple=E-Mails durch Kommata getrennt hinzuf\uFFFD\uFFFDgen.
mail.title=Einladungs-E-Mail
box.preview=Vorschau
box.share.text=Sie können jetzt die ausgewählten Bilder mit Ihren Freunden teilen.
@@ -6880,7 +6880,7 @@ browser.cancel=Abbrechen
browser.selectImage=übernehmen
browser.starImage=merken
browser.removeImage=Löschen
-browser.searchForSimilarImages=ähnliche
+browser.searchForSimilarImages=\uFFFD\uFFFDhnliche
browser.clear_drop_box=löschen
Also, there are lines changed, which I have not touched. I don't understand why I get such a behavior. What could be the cause for the above problem?
My system:
Antergos / Arch Linux
System encoding UTF-8
Python 3.5.0 (default, Sep 20 2015, 11:28:25) [GCC 5.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'utf-8'
Eclipse Mars 1
- Tomcat 8
- Java JDK 8
If I use another Editor like Atom to edit those message properties files, I don't ran into this problem.
I also realized in a case, if I copy the original value browser.searchForSimilarImages=ähnliche
from Git diff and replace the wrong value browser.searchForSimilarImages=\uFFFD\uFFFDhnliche
in Eclipse with that, then I have the correct umlauts in the message properties file.
new String(value.getBytes("ISO-8859-1"), "UTF-8");
to have it correct in the properties file? – BuZZ-dEE