C Program Strange Characters retrieved due to language setting on Windows

Question

If the below code is compiled with UNICODE as compiler option, the GetComputerNameEx API returns junk characters. Whereas if compiled without UNICODE option, the API returns truncated value of the hostname. This issue is mostly seen with Asia-Pacific languages like Chinese, Japanese, Korean to name a few (i.e., non-English). Can anyone throw some light on how this issue can be resolved.

# define INFO_SIZE 30
int main()
{
    int ret;
    TCHAR infoBuf[INFO_SIZE+1];
    DWORD  bufSize = (INFO_SIZE+1);
    char *buf;

    buf = (char *) malloc(INFO_SIZE+1);

    if (!GetComputerNameEx((COMPUTER_NAME_FORMAT)1,
                                (LPTSTR)infoBuf, &bufSize))
    {
        printf("GetComputerNameEx failed (%d)\n", GetLastError());
        return -1;
    }

    ret = wcstombs(buf, infoBuf, (INFO_SIZE+1));
    buf[INFO_SIZE] = '\0';

    return 0;
}

You allocate only 30 bytes, regardless of whether you're using "ANSI" (eight-bit chars) or "UNICODE" (16-bit chars). — Adrian McCarthy

roeland roeland · Accepted Answer · 2016-01-21T02:00:05

In the languages you mentioned, most characters are represented by more than one byte. This is because these languages have alphabets of much more than 256 characters. So you may need more than 30 bytes to encode 30 characters.

The usual pattern for calling a function like wcstombs goes like this: first get the amount of bytes required, then allocate a buffer, then convert the string.

(edit: that actually relies on a POSIX extension, which also got implemented on Windows)

size_t size = wcstombs(NULL, infoBuf, 0);
if (size == (size_t) -1) {
    // some character can't be converted
}
char *buf = new char[size + 1];
size = wcstombs(buf, infoBuf, size + 1);

C Program Strange Characters retrieved due to language setting on Windows

1 Answers