I tried to use java.io.FileReader to read some text files and convert them into a string, but I found the result is wrongly encoded and not readable at all.
Here's my environment:
Windows 2003, OS encoding: CP1252
Java 5.0
My files are UTF-8 encoded or CP1252 encoded, and some of them (UTF-8 encoded files) may contain Chinese (non-Latin) characters.
I use the following code to do my work:
private static String readFileAsString(String filePath)
throws java.io.IOException{
StringBuffer fileData = new StringBuffer(1000);
FileReader reader = new FileReader(filePath);
//System.out.println(reader.getEncoding());
BufferedReader reader = new BufferedReader(reader);
char[] buf = new char[1024];
int numRead=0;
while((numRead=reader.read(buf)) != -1){
String readData = String.valueOf(buf, 0, numRead);
fileData.append(readData);
buf = new char[1024];
}
reader.close();
return fileData.toString();
}
The above code doesn't work. I found the FileReader's encoding is CP1252 even if the text is UTF-8 encoded. But the JavaDoc of java.io.FileReader says that:
The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate.
Does this mean that I am not required to set character encoding by myself if I am using FileReader? But I did get wrongly encoded data currently, what's the correct way to deal with my situtaion? Thanks.