6
votes

I want to make my Win32 C++ application able to be played on any encoding version (UNICODE & ANSI). Now I am a little confused as to what exactly is the difference between the two(or more?) encodings?

To make my Win32 application cross-encoding compatible does that mean I have to go through my code & replace every std::string with std::wstring, then replace every char with a wchar_t* and then replace every literal string("") with L""?

What will happen if my application runs on a UNICODE machine & my application has one std::string in it?

Do you have any advice on the steps I need to take to make my application cross-encoding compatible? For eg: - Change all c_strings & strings to their UNICODE equivalent - Change any Win32 functions to the uncide version (eg, change from getenv() to _wgetenv())

4
Are you seriously thinking about support Windows 9x?Cheers and hth. - Alf
I did, 5 years ago, and even I didn't bother with ANSI. MSLU (Unicows.DLL) makes 9x appear sufficiently NT-like.MSalters
what Alf said. if you're doing anything other than supporting full unicode everywhere on Win32 you're either doing something wrong or are in a terrible situation of supporting operating systems beyond obsolete...Jewel S

4 Answers

0
votes

You can use TCHAR in your case.

In UNICODE, TCHAR is WCHAR. In not UNICODE, TCHAR is CHAR.

If you want to use std::string, I recommend you the following usage.

 #ifdef UNICODE
 #define std::tstring str::wstring
 #else
 #define std::tstring str::string
 #endif

and,

Use std::tstring in your program.

6
votes

What will happen if my application runs on a UNICODE machine & my application has one std::string in it?

Computers are not ANSI or Unicode but the Operating Systems on which the computers operate on are. The last version of Windows that didn't support Unicode was Windows 3.11 for Workgroups. If you run a ASCII compiled application on a UniCode.

What exactly is the difference between the two(or more?) encodings?

What is ASCII?
ASCII is a seven-bit encoding technique which assigns a number to each of the 128 characters used most frequently in American English. This allows most computers to record and display basic text. ASCII does not include symbols frequently used in other countries.

What is Unicode?
One major draw back to ASCII was you could only have 256 different characters. However, languages such as Japanese and Arabic have thousands of characters. Thus ASCII would not work in these situations. The result was Unicode which allowed for up to 65,536 different characters.

Unicode is an attempt by ISO and the Unicode Consortium to develop a coding system for electronic text that includes every written alphabet in existence. Unicode uses 8-, 16-, or 32-bit characters depending on the specific representation, so Unicode documents often require up to twice as much disk space as ASCII or Latin-1 documents. The first 256 characters of Unicode are identical to Latin-1.

In Win32, UNICODE is supported by #define-ing the UNICODE and _UNICODE macros. This, in turn, causes your program to use the Unicode variants of the Win32 functions.

Do you have any advice on the steps I need to take to make my application cross-encoding compatible?

Each Win32 function (that takes or returns a string) has two variants, one for ASCII and one for Unicode. And the function call resolves to one of these, depending on whether or not the UNICODE macro is defined. So you should define the macro and start using the Unicode versions of the functions. for eg:

Replacing every std::string with std::wstring,
Replacing every char with a wchar_t*
Replacing every literal string("") with L""
Making use of the TCHAR support in Windows etc.

as you pointed out are a list of things that you will have to take care of, mind you this is not the complete list.

Basically, You will have to use all the Unicode versions of the types and function calls in your code.

3
votes

The last version of Windows that did not use Unicode internally was Windows ME. The recommendation for new code is to use Unicode exclusively. Some conversion may be necessary when you need to read and write files that are encoded with a specific code page.

You're on the right track with your initial thoughts. If you're using Microsoft's CString, it comes in two versions CStringA and CStringW - you need to change one compiler definition and it will use CStringW in every place that you specify CString, and everything will just work. You should use std::wstring instead of std::string. Prefix every string literal with L"" or use Microsoft's macro _T("") which will convert to the same thing.

3
votes

When you compile a program for ANSI or Unicode, you're affecting two things.

  1. Which set of APIs get called. Suppose your code calls CreateFile(). The actual API called is either CreateFileA() or CreateFileW() (ANSI or Wide (i.e. Unicode)) depending on your compiler setting. Internally the NT kernal uses Unicde for all APIs. The ANSI APIs simply convert their string parameters to ANSI and call the Unicode APIs. Many APIs are Unicode only.
  2. How T* macros are expanded. TCHAR will eventually be expanded to char in ANSI mode, wchar_t in Unicode mode.

Things like std::string and std::wstring are not affected until you need to call an API and want to pass a string to them. The use of string vs. wstring should be determined by your program's needs and not whether it's compiled ANSI or Unicode.

You can use ATL to easily convert strings as necessary.

// assume compiled for Unicode
#include <atlbase.h>

void myfunc() {
   USES_CONVERSION;

   std::string filename = "...";
   HANDLE hFile = CreateFile(A2W(filename.c_str()), ...

or, if you prefer, you can use A2T() and your code will work whether it's compiled for ANSI or Unicode.