Export UTF-8 BOM to .csv in R

Question

I am reading a file through RJDBC from a MySQL database and it correctly displays all letters in R (e.g., נווה שאנן). However, even when exporting it using write.csv and fileEncoding="UTF-8" the output looks like <U+0436>.<U+043A>. <U+041B><U+043E><U+0437><U+0435><U+043D><U+0435><U+0446>(in this case this is not the string above but a Bulgarian one) for Bulgarian, Hebrew, Chinese and so on. Other special characters like ã,ç etc work fine.

I suspect this is because of UTF-8 BOM but I did not find a solution on the net

My OS is a German Windows7.

edit: I tried

con<-file("file.csv",encoding="UTF-8")
write.csv(x,con,row.names=FALSE)

and the (afaik) equivalent write.csv(x, file="file.csv",fileEncoding="UTF-8",row.names=FALSE).

Are you saying that when you open the exported file, you see "U+0436" instead of "ж"? If so that's no BOM issue, just an issue of the Unicode code points not being encoded into a UTF encoding, but output as code points. Maybe show us some code how exactly you're exporting the file? — deceze
I added information on how I exported the file. And yes, I see "<U+0436>" instead of "ж" — Arthur G
Seeing "<U+0436>" in the file is ambiguous (it could even mean that those characters are actually inlined in that file or your editor just cannot display them). You could either write us the "ж" in a file and tell us the hex-values of all the characters the generated file contains (open it in a hex-editor); OR give us the code to reproduce your problem (of course we dont have your DB, so create a vector with the sample data). — Bernd Elkemann

Ron Ron · Accepted Answer · 2016-12-31T11:56:57

The accepted answer did not help me in a similar application (R 3.1 in Windows, while I was trying to open the file in Excel). Anyway, based on this part of file documentation:

If a BOM is required (it is not recommended) when writing it should be written explicitly, e.g. by writeChar("\ufeff", con, eos = NULL) or writeBin(as.raw(c(0xef, 0xbb, 0xbf)), binary_con)

I came up with the following workaround:

write.csv.utf8.BOM <- function(df, filename)
{
    con <- file(filename, "w")
    tryCatch({
    for (i in 1:ncol(df))
        df[,i] = iconv(df[,i], to = "UTF-8") 
    writeChar(iconv("\ufeff", to = "UTF-8"), con, eos = NULL)
    write.csv(df, file = con)
    },finally = {close(con)})
}

Note that df is the data.frame and filename is the path to the csv file.

Export UTF-8 BOM to .csv in R

2 Answers