1
votes

I have CSV file that appears to be correctly encoded in UTF-8.

   iconv -f UTF-8 file.csv -o /dev/null  # returns 0

When I try to recode the file to cp1250 (tried recode/iconv even Perl), resulting file is encoded in iso-8859-1 - at least according to

   file -i resulting_file.csv
   resulting_file.csv: text/plain; charset=iso-8859-1

Locale setting on the server is

   LANG=en_US.UTF-8
   LC_CTYPE="en_US.UTF-8"
   LC_NUMERIC="en_US.UTF-8"
   LC_TIME="en_US.UTF-8"
   LC_COLLATE="en_US.UTF-8"
   LC_MONETARY="en_US.UTF-8"
   LC_MESSAGES="en_US.UTF-8"
   LC_PAPER="en_US.UTF-8"
   LC_NAME="en_US.UTF-8"
   LC_ADDRESS="en_US.UTF-8"
   LC_TELEPHONE="en_US.UTF-8"
   LC_MEASUREMENT="en_US.UTF-8"
   LC_IDENTIFICATION="en_US.UTF-8"
   LC_ALL=

I can´t figure out why. Any help appreciated, thanks.

1

1 Answers

0
votes

Iso-8895-1, iso-8895-15 and Windows-1252 (CodePage1252) character sets are very similar, only differing by a handful of characters and/or locations. For example, iso-8895-1 doesn't have a euro (€) symbol. Windows-1252 and -15 do, but it's mapped to different bytes.

file uses "magic" lookup to guess the encoding. If the characters that make those character sets differ don't exist in the text, then file can't differentiate between the three.

It certainly sounds like you have some non-ASCII Latin characters but not enough for file to know any difference.

You can rest easy though - your file is compatible with Windows-1252 encoding.