7
votes

I create csv file with data by the means of java. And I faced the following well-known issue: the letters in Portuguese were displayed by the wrong way in Excel (when opening by double click).

I solved this by UTF-16LE+BOM, but excel started to recognize tabs as columns separators instead of commas.

So I looked up for another solution and saw many posts, in which people say that just adding UTF-8 BOM and writing file in UTF-8 will do the job for Excel 2007 and later. I tried the simpliest sample on my work computer and it failed. But when I tried this at my home computer it worked like a charm.

Both computers have the same versions of java installed and operating system Windows 7. I am confused. Can anyone tell what can cause such a strange behaviour?

You can see my simpliest sample below:

String filename = "D:/check/test_with_bom.csv";
        FileOutputStream fos = new FileOutputStream(filename);
        byte[] bom = new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF }; 
        fos.write(bom);
        OutputStreamWriter osw = new OutputStreamWriter(fos , "UTF-8");
        PrintWriter printWriter = new PrintWriter(osw);

        printWriter.print("Hello,Olá,ão,ção");
        printWriter.close();
1
Same version of Excel as well?Robin Green
Why are you creating that obsolete PrintWriter? Is calling write on the Writer so much harder than calling print on the PrintWriter?Holger
Robin Green, I guess yes - both machines has 2007 Excel. The obvious difference that I see is that on work machine I have russian excel.me1111
Holger, changing print to write doesn't change the results. This is just a test sample, that I wrote in order to prove the possibility of such approach.me1111

1 Answers

8
votes

You should be aware that Excel does not "open" csv files. It converts them to an Excel file on the fly, using defaults. These defaults can be different depending on your regional settings. Because of that, it's never a good idea to let Excel open csv files using the defaults, since you'll never know for sure what you end up with.

A safer method is to use the 'import from text' method, and explicitly specify the delimiter, encoding, ect... Yet, be aware that 'save as csv' in Excel is an even worse idea, since it does not allow you to specify the encoding, delimiter, or any other detail. Access does.

On American Windows versions of Excel, the default column separator is a comma. On European Windows versions the comma is reserved for the Decimal Symbol and to avoid conflicts, a semicolon is used by default as column separator.

If you -realy- -realy- -have- to use CSV, you can consider adding the "sep=," indicator at the top of your csv file. yet, be aware that this will probably cause problems in other applications.