0
votes

Can somebody please tell me what would be the Unicode equivalent for "(char)"?

For ASCII I always used for example

 (char)(7)

Now I want to do the same but for Unicode.

But

 (wchar_t)(7)

did not work out, and wchar does not exist.

I want to pass this (possible) Unicode character to a function that accepts a wstring.

Or in other words: How can I programmatically create a wstring from a unicode character value (like 7 for TAB)?

2
Could you show some code, please? These fragments aren't very usefulbash.d
"Did not work out" is not enough of a problem description. I don't think anyone has any idea of what you are trying to do.R. Martinho Fernandes
C++ isn't ASCII-aware IIRC, but in C++11 char must be able to store an 8-bit UTF-8 code unit; there's also char16_t and char32_t (with at least 16 and 32 bit respectively). Note a single glyph can be represented by multiple UTF-8/-16 code units, therefore there's no exact equivalent to a char interpreted as an ASCII character.dyp
I don't know what "ASCII-awareness" is, but it sounds completely useless to me. And in any case, C++ is aware of ASCII in footnote 14.R. Martinho Fernandes
C++ unicode equivalent for char is, .... char. utf8everywhere.orgPavel Radzivilovsky

2 Answers

3
votes

I want to pass this (possible) Unicode character to a function that accepts a wstring.

Then you'll need to make a wstring, just as you would have to make a string from a char if you needed to pass an ASCII character to a function accepting a string.

function(std::wstring(1, 7)); // length 1, filled with value 7
0
votes

In memory data is not stored in Unicode. Unicode provides a unique number for every (well, a whole lot) of characters used on computers.

In memory, characters are encoded: mapping byte values to Unicode numbers.

What encoding are you using?

  • UTF-8: Each character is mapped to a 1, 2, 3, 4, 5, or 6 byte sequences.
  • UTF-16: Each character is mapped to a 2 or 4 byte sequence.
  • UCS-2: An incomplete mapping of most characters into a 2 byte sequence.
  • UTF-32: Each character is mapped to a 4 byte sequence.

wchar_t in win32 is 16-bit and characters are expected to be encoded in UCS-2 UTF-16.

wchar_t on most *NIXes is 32-bit and characters are expected to be encoded in UTF-32.

UPDATE

Huh, looks like I'm old. My last work was in VS2005 and there was still references to UCS-2 as the internal encoding, but I guess even that was already out of date.