0
votes

I have a UTF-16LE string 'TEST' and its hexdump as below

feff 0074 0065 0073 0074 000a

If I convert this string to UTF-8 using the command iconv on bash, then it is getting converted without any issues.

6574 7473 000a

But, If I do the same this with my C program, then as soon as 0x00 is encountered with the character 'T', it seems that iconv function treats it as a null termination even tough I have specified the string length as 12 (including bom and null termination).

65 000a

Below is the code which I am testing with. However If i convert wide char string with any size (just without 0x00 bytes in between) would return me correct output.

char *cOutput;    // Output buffer with more enough size required
size_t tOutput; 
char *cInput;     // string wide characters
size_t tInput;
iconv_t cd;

........

cd = iconv_open("UTF8//TRANSLIT", "UTF-16LE");
iconv(cd, &cInput, &tInput, &cOutput, &tOutput);

Is there any solution for this problem or if I am doing something wrong? Any input will be appreciated.

1
A line of code is worth more than 1000 words. Show your code.user529758
As H2CO3 said, show more code, like how you initialize the data and sizes.unwind

1 Answers

1
votes

At a guess, your problem is that you are initialising tInput incorrectly, perhaps using strlen(cInput).

This code produces the expected output for me:

#include <stdio.h>
#include <string.h>
#include <iconv.h>

int main()
{
    char utf16le_str[] = { '\xff', '\xfe', '\x74', '\x00', '\x65', '\x00',
        '\x73', '\x00', '\x74', '\x00', '\x0a', '\x00' };
    char dest_str[100];
    char *in = utf16le_str;
    char *out = dest_str;
    size_t inbytes = sizeof utf16le_str;
    size_t outbytes = sizeof dest_str;
    iconv_t conv = iconv_open("UTF-8//TRANSLIT", "UTF-16LE");

    if (conv == (iconv_t)-1) {
        perror("iconv_open");
        return 1;
    }

    if (iconv(conv, &in, &inbytes, &out, &outbytes) == (size_t)-1) {
        perror("iconv");
        return 1;
    }

    dest_str[sizeof dest_str - outbytes] = 0;
    puts(dest_str);

    return 0;
}