0
votes

I'm trying to convert a multi-byte(UTF) string to Widechar string and mbsnrtowcs is always failing. Here is the input and expected strings:

char* pInputMultiByteString = "A quick brown Fox jumps \xC2\xA9 over the lazy Dog.";
wchar_t* pExpectedWideString = L"A quick brown Fox jumps \x00A9 over the lazy Dog.";    

Special character is the copyright symbol.

This conversion works fine when I use Windows MultiByteToWideChar routine, but since that API is not available on linux, I have to use mbsnrtowcs - which is failing. I've tried using other characters as well and it always fails. The only expection is that when I use only an ASCII based Input string then mbsnrtowcs works fine. What am I doing wrong?

3
if you want it portable, why not use boost::nowide?Pavel Radzivilovsky
@Pavel I have to use mbsnrtowcstunafish24
@tunafish24: so what will you do if you cannot do it with mbsnrtowcs?Yakov Galka

3 Answers

1
votes

UTF is not a multibyte string (although it is true that unicode characters will be represented using more than 1 byte). A multibyte string is a string that uses a certain codepage to represent characters and some of them will use more than one byte.

Since you are combining ANSI chars and UTF chars you should use UTF8.

So trying to convert UTF to wchar_t (which on windows is UTF16 and on linux is UTF32) using mbsnrtowcs just can't be done.

If you use UTF8 you should look into a UNICODE handling library for that. For most tasks I recommend using UTF8-CPP from http://utfcpp.sourceforge.net/

You can read more on UNICODE and UTF8 on Wikipedia.

0
votes

MultiByteToWideChar has a parameter where you specify the code page, but mbsnrtowcs doesn't. On Linux, have you set LC_CTYPE in your locale to specify UTF-8?

0
votes

SOLUTION: By default each C program uses "C" locale, so I had to call setlocale(LCTYPE,"").."" means that it'll use my environment's locale i.e. en_US.utf8 and the conversion worked.