What will happen if my application runs on a UNICODE machine & my application has one std::string in it?
Computers are not ANSI
or Unicode
but the Operating Systems on which the computers operate on are. The last version of Windows that didn't support Unicode was Windows 3.11 for Workgroups. If you run a ASCII compiled application on a UniCode.
What exactly is the difference between the two(or more?) encodings?
What is ASCII?
ASCII is a seven-bit encoding technique which assigns a number to each of the 128 characters used most frequently in American English. This allows most computers to record and display basic text. ASCII does not include symbols frequently used in other countries.
What is Unicode?
One major draw back to ASCII was you could only have 256 different characters. However, languages such as Japanese and Arabic have thousands of characters. Thus ASCII would not work in these situations. The result was Unicode which allowed for up to 65,536 different characters.
Unicode is an attempt by ISO and the Unicode Consortium to develop a coding system for electronic text that includes every written alphabet in existence. Unicode uses 8-, 16-, or 32-bit characters depending on the specific representation, so Unicode documents often require up to twice as much disk space as ASCII or Latin-1 documents. The first 256 characters of Unicode are identical to Latin-1.
In Win32, UNICODE is supported by #define-ing the UNICODE
and _UNICODE
macros. This, in turn, causes your program to use the Unicode variants of the Win32 functions.
Do you have any advice on the steps I need to take to make my application cross-encoding compatible?
Each Win32 function (that takes or returns a string) has two variants, one for ASCII and one for Unicode. And the function call resolves to one of these, depending on whether or not the UNICODE macro is defined. So you should define the macro and start using the Unicode versions of the functions. for eg:
Replacing every std::string
with std::wstring
,
Replacing every char
with a wchar_t*
Replacing every literal string("")
with L""
Making use of the TCHAR
support in Windows etc.
as you pointed out are a list of things that you will have to take care of, mind you this is not the complete list.
Basically, You will have to use all the Unicode versions of the types and function calls in your code.