The source encoding of .java files in our Maven project which is stored in Subversion mostly ASCII and some files are UTF-8.
I think the intention was that these files would be UTF-8. In the pom file the source encoding is specified as UTF-8.
Now our build fails specifically our SonarQube analysis fails on a .java file which is ISO-8859 and which has a variable with a special character. Using a special character is not a good idea think but that aside, shouldn't the java files have consistent (UTF-8) encoding?
Or does it not matter that most are ASCII and only some are UTF-8? It is the thought that counts?
I btw don't understand how these files end up with ASCII encoding. When I use a IDE or editor like SublimeText files end up as UTF-8.
ASCII I only get when I use NotePad on MS Windows. Java developers do not typically use that for programming.
Should we change the source files to use UTF-8? Or maybe it doens't matter and we can leave this as it is?
As an example. Using MS Windows I create one file using SublimeText and one file using Notepad.exe. I put text 1234Ï
in those files. The text contains a special character I with two dots.
When I look at these file on Linux using file
ostraaten@io:/tmp/iconv$ file sublimtext.txt
sublimtext.txt: UTF-8 Unicode (with BOM) text, with no line terminators
ostraaten@io:/tmp/iconv$ file notepad.txt
notepad.txt: ISO-8859 text, with no line terminators
ostraaten@io:/tmp/iconv$
So this shows Notepad saved the file as ISO-8859 regardless of the contents. When I check the files using iconv
ostraaten@io:/tmp/iconv$ iconv -f UTF-8 notepad.txt -o /dev/null
iconv: incomplete character or shift sequence at end of buffer
ostraaten@io:/tmp/iconv$ iconv -f UTF-8 sublimtext.txt -o /dev/null
ostraaten@io:/tmp/iconv$
I can open and save the file notepad.txt
using SublimeText, the encoding still shows up as ISO-8859.
The character does display correctly in both files. So this support the idea that somewhere the editor tries to determine encoding from the contents of the file. But somewhere else the file is still marked and recognized as ISO-8859.
I can change the encoding using iconv
ostraaten@io:/tmp/iconv$ iconv -f ISO-8859-15 -t UTF-8 notepad.txt > notepad-utf8.txt
ostraaten@io:/tmp/iconv$ file notepad-utf8.txt
notepad-utf8.txt: UTF-8 Unicode text, with no line terminators
ostraaten@io:/tmp/iconv$
straaten@io:/tmp/iconv$ iconv -f UTF-8 notepad-utf8.txt -o /dev/null
The conversion was successful because the message incomplete character is gone.