When should we prefer wide-character strings?

Question

I am modernizing a large, legacy MFC codebase which contains a veritable medley of string types:

CString
std::string
std::wstring
char*
wchar_t*
_bstr_t

I'd like to standardize on a single string type internally, and convert to other types only when absolutely required by a third-party API (i.e. COM or MFC functions). The question my coworkers and I are debating; which string type should we standardize on?

I would prefer one of the C++ standard strings: std::string or std::wstring. I'm personally leaning toward std::string, because we do not have any need for wide characters - it is an internal codebase with no customer-facing UI (i.e. no need for multiple-language support). "Plain" strings allow us to use simple, unadorned string literals ("Hello world" vs L"Hello world" or _T("Hello world")).

Is there an official stance from the programming community? When faced with multiple string types, what is typically used as the standard 'internal' storage format?

Windows internally is UTF-16LE so std::wstring is a good fit for that platform; so is std::vector<wchar_t>. — Richard Critten
For a Windows application use std::wstring. With narrow strings you'd need conversions all over the place. Note: since you don't already know this, you're not a good choice for person to do the job, it's basics. That choice is your manager's fault. — Cheers and hth. - Alf
Re _T("Hello world"), the T macros were obsoleted in the year 2000 by the introduction of Layer for Unicode, and today our tools can't produce executables for the Windows versions (9x) that these macros target. I understand it's a legacy codebase. But when your task is to clean it up, mentioning T macros as convenient is absurd and very counter-productive. — Cheers and hth. - Alf
If you choose narrow chars then all you need to break your program is one employee with a non-latin name and you hit encoding problems for the user and below directories. — Richard Critten
@BTownTKD; Your statement "Windows provides narrow-char alternatives for nearly all APIs" is based on full ignorance. The narrow functions do conversion to/from Windows ANSI, which is (1) system specific, and (2) unable to represent e.g. all filesystem paths. Also, many APIs, especially newer ones, have no ANSI wrappers. — Cheers and hth. - Alf

Simon Mourier Simon Mourier · Accepted Answer · 2017-08-31T14:46:49

If we talk about Windows, than I'd use std::wstring (because we often need cool string features), or wchar_t* if you just pass strings around.

Note Microsoft recommends that here: Working with Strings

Windows natively supports Unicode strings for UI elements, file names, and so forth. Unicode is the preferred character encoding, because it supports all character sets and languages. Windows represents Unicode characters using UTF-16 encoding, in which each character is encoded as a 16-bit value. UTF-16 characters are called wide characters, to distinguish them from 8-bit ANSI characters. The Visual C++ compiler supports the built-in data type wchar_t for wide characters

Also:

When Microsoft introduced Unicode support to Windows, it eased the transition by providing two parallel sets of APIs, one for ANSI strings and the other for Unicode strings. [...] Internally, the ANSI version translates the string to Unicode.

Also:

New applications should always call the Unicode versions. Many world languages require Unicode. If you use ANSI strings, it will be impossible to localize your application. The ANSI versions are also less efficient, because the operating system must convert the ANSI strings to Unicode at run time. [...] Most newer APIs in Windows have just a Unicode version, with no corresponding ANSI version.

When should we prefer wide-character strings?

2 Answers