What is the current modern term for “Multi-byte Character Set”

Question

I used to be confusing quite a while :

Confusion on Unicode and Multibyte Articles

After reading up the comments by all contributors, plus :

Looking at an old article (Year 2001) : http://www.hastingsresearch.com/net/04-unicode-limitations.shtml, which talk about unicode :

being a 16-bit character definition allowing a theoretical total of over 65,000 characters. However, the complete character sets of the world add up to over 170,000 characters.

and Looking at current "modern" article : http://en.wikipedia.org/wiki/Unicode

The most commonly used encodings are UTF-8 (which uses 1 byte for all ASCII characters, which have the same code values as in the standard ASCII encoding, and up to 4 bytes for other characters), the now-obsolete UCS-2 (which uses 2 bytes for all characters, but does not include every character in the Unicode standard), and UTF-16 (which extends UCS-2, using 4 bytes to encode characters missing from UCS-2).

It seems that in the compilation options in VC2008, the options "Unicode" under Character Sets really means "Unicode encoded in UCS-2" (Or UTF-16? I am not sure)

I try to verify this by running the following code under VC2008

#include <iostream>

int main()
{
    // Use unicode encoded in UCS-2?
    std::cout << sizeof(L"我爱你") << std::endl;
    // Use unicode encoded in UCS-2?
    std::cout << sizeof(L"abc") << std::endl;
    getchar();

    // Compiled using options Character Set : Use Unicode Character Set.
    // print out 8, 8

    // Compiled using options Character Set : Multi-byte Character Set.
    // print out 8, 8
}

It seems that during compilation with Unicode Character Set options, the outcome matched my assumption.

But what about Multi-byte Character Set? What does Multi-byte Character Set means in current "modern" world? :)

MBCS means nothing. Today we have Unicode. All you knew before is gone (mostly). — John Saunders
the use of L macro causes compiler to treat both string as "wide character string", hence make sense for the result of (8, 8) you obtained. Removing the L will give result of (7, 4), as per Microsoft standard /shrug — YeenFei
@Pototoswatter: What are you talking about? A string literal has array type, in this case wchar_t const[4]. When you dereference that, the array first decays to a wchar_t const*. Dereferencing that in turn gives you a wchar_t const. Thus, *L"123456789" == L'1' and sizeof(*L"123456789")==sizeof(L'1') — MSalters
@MSalters: you're right; it was coincidence that his strings are a power of 2 size. Corrected in my answer. — Potatoswatter

MSN MSN · Accepted Answer · 2010-03-10T07:02:01

http://en.wikipedia.org/wiki/Multi-byte_character_set

MBCS is a term used to denote a class of character encodings with characters that cannot be represented with a single byte, hence multi-byte character set. In order to properly decode a string in this format, you need a codepage that tells you various byte combinations map to characters. ISO/IEC 8859 defines a set of MBCS standards, but according to Wikipedia, ISO stopped maintaining them in 2004, presumably to focus on Unicode.

So I guess the modern term for MBCS is "deprecated in favor of Unicode".

What is the current modern term for “Multi-byte Character Set”

5 Answers