Multi-Byte to Widechar conversion using mbsnrtowcs

Question

I'm trying to convert a multi-byte(UTF) string to Widechar string and mbsnrtowcs is always failing. Here is the input and expected strings:

char* pInputMultiByteString = "A quick brown Fox jumps \xC2\xA9 over the lazy Dog.";
wchar_t* pExpectedWideString = L"A quick brown Fox jumps \x00A9 over the lazy Dog.";

Special character is the copyright symbol.

This conversion works fine when I use Windows MultiByteToWideChar routine, but since that API is not available on linux, I have to use mbsnrtowcs - which is failing. I've tried using other characters as well and it always fails. The only expection is that when I use only an ASCII based Input string then mbsnrtowcs works fine. What am I doing wrong?

@tunafish24: so what will you do if you cannot do it with mbsnrtowcs? — Yakov Galka

Sebastian Cabot Sebastian Cabot · Accepted Answer · 2012-11-10T12:40:23

UTF is not a multibyte string (although it is true that unicode characters will be represented using more than 1 byte). A multibyte string is a string that uses a certain codepage to represent characters and some of them will use more than one byte.

Since you are combining ANSI chars and UTF chars you should use UTF8.

So trying to convert UTF to wchar_t (which on windows is UTF16 and on linux is UTF32) using mbsnrtowcs just can't be done.

If you use UTF8 you should look into a UNICODE handling library for that. For most tasks I recommend using UTF8-CPP from http://utfcpp.sourceforge.net/

You can read more on UNICODE and UTF8 on Wikipedia.

Multi-Byte to Widechar conversion using mbsnrtowcs

3 Answers