0
votes

I'm developing a Win32 API Wrapper. To make it Unicode-compliant, I do the following:

#ifndef UNICODE
#define gchar char
#define gstrcpy strcpy
#define gstrncpy strncpy
#define gstrlen strlen
#define gstrcat strcat
#define gstrncat strncat
#define gstrcmp strcmp
#define gstrtok strtok
#else
#define gchar wchar_t
#define gstrcpy lstrcpy
#define gstrncpy lstrncpy
#define gstrlen lstrlen
#define gstrcat lstrcat
#define gstrncat lstrncat
#define gstrcmp lstrcmp
#define gstrtok lstrtok
#endif

I also provide

#define uni(s) TEXT(s)

My test consisted of a window that creates a message box via

msg (uni("Left-click"));

whenever the user left-clicks the window. The problem is that, no matter how many messages are created, after 4 or 5 of these messages are closed when I #define UNICODE, the next message box shown, whether it be a new one or the one under the last one closed causes the program to return 0xC0000005. Not defining UNICODE will make this work perfectly. My msg function is as follows:

dword msg (cstr = uni(""), cstr = uni(""), hwin = null, dword = 0);
...
dword msg (cstr lpText, cstr lpCaption, hwin hWnd, dword uType)
{
    return MessageBox (hWnd, lpText, lpCaption, uType);
}

where dword is DWORD, cstr is pchar, which is gchar *, which can be char * or wchar_t *, hwin is HWND, and null is 0.

It probably isn't the message box doing this, but I haven't done any other text-related stuff with the testing so I'll see if it crashes some other way too.

Does anyone know why this would happen? The difference between MB characters and unicode shouldn't cause the program to repeatedly crash. I can upload the headers and the test too, if needed.

Edit: I just found out creating one message and then closing the actual window will result in the same crash. SOURCE CODE Here's the link for the source. Please keep in mind: a) I only took one first-year programming course, ever (C++). b) My wrapper's purpose is to make writing win32 apps as easy as possible. c) I like to make things of my own (string class etc).

Also forgot this (duh), I'm using Code::Blocks (MinGW).

Edit: I didn't realize before, but the program is trying to access memory at 0x00000000. This is what's causing the problem, but I have no idea why it would be trying to do this. I believe the instruction trying to access it is located somewhere in winnt.dll, but having never learned how to debug, I'm still trying to figure out how to find the information I need.

Edit: Now, without changing it, but running it on a different computer, it's referencing 0x7a797877 instead of 0.

Edit: Changing the window procedure to include WM_LBUTTONDOWN and call msg() inside, rather than calling the added procedure makes the program work perfectly. Something with the way addmsg() and the window procedure are coded causes the _lpWindowName and _lpClassName to have corrupted data after a while, but non-pointer members are still preserved.

EDIT: After all of this mayhem I finally found out I was missing a single character in all of my source code. When I defined msgparams as Window, UINT, WPARAM, LPARAM and likewise with msgfillparams (except with names) I forgot to pass a reference. I was passing the Window by value! I'd still like to thank everyone who posted, as I did get my butt kicked debugger-wise and ended up learning a lot more about Unicode as well.

2
Could you upload the smallest example you can make which reproduces this problem?Mike Kwan
why are you reinventing microsoft's T-macros for Windows 9x-support? why all this extra indirection? it's dumb.Cheers and hth. - Alf
@chris: learn to use your IDE's debugger. Run the app inside the debugger, and let it show you exactly WHERE the AV is occuring, then you can troubleshoot WHY it is occuring.Remy Lebeau
Alf, I personalised my wrapper a lot. I'm changing to reflect the tchar.h ones for gchar though, I just like gchar more than tchar. Remy, I'll have info from that before too long (night time now). Mike, here is the full source code. Changing the mentioned things tomorrow hopefully, but as I'm still figuring out what types of coding work well, input on techniques is appreciated.chris
By the way, you shouldn't be using __TEXT() - the one with two underscores - it's an internal helper that TEXT() uses so that #define'd strings like __FILE__ can be processed correctly. If you try to use __TEXT(__FILE__), you'll get a compiler error saying that L__FILE__ can't be found. TEXT(__FILE__) will correctly give you a UNICODE string representing the current source file.BrendanMcK

2 Answers

2
votes

you should do your homework before asking questions on SO. My impression is that you have almost no idea about how Unicode works on Windows and it will require many pages to explain.

Porting an application from ANSI to Unicode is a big deal on Windows. It may seem reasonable to pay someone with experience do to this.

Mainly everything that worked with char will have to work with wchar_t.

The entire API has other functions but you should start by using windows support for this, not writing your own macros and first step is to use _T not W so you'll b able to start changing code and still be able to compile in both Unicode and ANSI.

1
votes

Why are you even bothering with ANSI in the first place? All the TCHAR support dates back to a time when Win95 was commonplace, so developers had to write code that could compile as ANSI (for Win95) or UNICODE (for NT-based Windows). Now that Win95 is long obsolete, there's no need to bother with TCHAR: just go all-UNICODE, using L"Unicode strings" instead of TEXT() and wcs-versions of the CRT rather than the _t-versions.

Having said that, here's some common sources of errors with ANSI/UNICODE code that could explain some of what you are seeing:

One possibility is that there's a bug somewhere that's corrupting the stack - uninitialized variable, stack overrun, and the like. In unicode, any chars or strings on the stack may take up a different amount of space compared to the ANSI version, so variables will end up in different places relative to one another. Chances are you are 'getting lucky' in the ANSI build, and whatever is being corrupted isn't important data; but on the UNICODE build, something important on the stack is getting nuked. (For example, if you overflow a buffer on the stack, you could end up overwriting the return address that's also on the stack, likely causing a crash at the next function return.)

--

Watch out for cases where you are mixing up character counts versus byte counts: with ANSI, you can use 'sizeof()' almost interchangeably with a character count (depending on whether you're counting the terminating NUL space or not); but with UNICODE, you can't: and if you get them mixed up, you can get a buffer overrun very easily.

For example:

// Incorrectly passing byte count instead of character count
WCHAR szWindowName[32];
GetWindowTextW( hwnd, szWindowName, sizeof(szWindowName) );
  • this can cause a buffer overrun (leading to crash - if you're lucky - or silently corrupted data and incorrect results later on if you're not lucky) since it's passing 64 - the size in bytes - to GetWindowText, instead of 32, the size in characters.

On windows, Use the ARRAYSIZE(...) instead of sizeof() to get the number of elements in an array rather than the byte-size of the array.

--

Another thing to watch for is any strings where you have used casts to "force" them into CHAR or WCHAR to avoid compiler errors: eg.

// Incorrectly calling ANSI function with UNICODE strings...
MessageBoxA(hwnd, (LPCSTR)L"Unicode Title", (LPCSTR)"Unicode content", MB_OK);

This type of usage typically results in just the first character of the string showing.

// Incorrectly calling UNICODE function with ANSI strings...
MessageBoxW(hwnd, (LPCWSTR)"ANSI Title", (LPCWSTR)"ANSI content", MB_OK);

This is trickier, you may get a string of garbage, or could get an error of some kind.

These cases are easy to spot there there are casts - generally speaking, casts should be viewed as a 'red flags' and avoided at all costs. Don't use them to avoid a compiler error, instead fix the issue that the compiler is warning about.

Also watch out for cases where you can get these mixed up but where the compiler won't warn you - eg with printf, scanf and friends: the compiler doesn't check the argument lists:

// Incorrectly calling ANSI function with UNICODE string - compiler won't warn you here...
LPCWSTR pString = L"I'm unicode!";
printf("The result is: %s\n", pString);