Incorrect output for UTF8 conversion using iconv

Question

I am trying to convert string encoded in ISO-8859-1 to UTF-8 on Linux. I am using iconv function to do that in C++. This is the code that I have:

//Conversion from ISO-8859-1 to UTF-8
iconv_t cd = iconv_open("UTF-8","ISO-8859-1");

char *input = "€"; // the byte value is 128 in ISO-8859-1
char *inputbuf= input;
size_t inputSize=1;

char *output = (char*)malloc(inputSize*4); // maximum size of a character in UTF8 is 4
char *outputbuf = output;
size_t outputSize = inputSize*4;

//Conversion Function
iconv (cd, &inputbuf, &inputSize, &outputbuf, &outputSize);

//Display input bytes(ISO-8859-1)
cout << "input bytes(ISO-8859-1):"
for (int i=0; i<inputSize; i++)
{
    cout <<(int) *(input+i) << ", ";
}
cout<< std::endl;

//Display Converted bytes(UTF-8)
cout << "output bytes(UTF-8):"
for (int i=0; i<outputSize; i++) //displaying all the 4 bytes allocated
{
    cout <<(int) *(output+i) << ", ";
}
cout<< std::endl;
iconv(cd);

This is the output I observe:

input bytes(ISO-8859-1): 128
output bytes(UTF-8): 194, 128, 0, 0

As you can see, the output UTF-8 converted bytesis 194,128. However, the expected UTF-8 output is 226,130,172. I verified that there is no error thrown by any of the iconv functions.

Can anyone please help me figure out if I am missing anything here?

According to this table, the code 128 is undefined in the ISO 8859-1 code page. — Mr.C64
€ is NOT byte 128 (0x80) in ISO-8859-1. In fact, byte 0x80 is unassigned in ISO-8859-1. You are thinking of Windows-1252 (or other similar charset), which does have € in byte 0x80 (it is not always 0x80 in all supporting charsets, though). Windows-1252 is commonly mistaken for ISO-8859-1. — Remy Lebeau

Llopeth Llopeth · Accepted Answer · 2017-07-07T13:50:24

You can either use the utfcpp library: http://utfcpp.sourceforge.net/ or Boost.Locale for that purpose

Incorrect output for UTF8 conversion using iconv

2 Answers