I am writing a C++ memory profiler.
As everyone knows, an executable program may contain many global objects, and its DLL modules may contain global objects as well. These global objects will be initialized with CRT initialization - after the entry point and before WinMain; for DLLs, the entry point is _DllMainCRTStartup, before you get into DllMain, all the global objects of the DLL are initialized.
A global object may allocate memory. so a memory profiler must be initialized before any global object initialization. After doing a lot of researching, I found that this is not an easy job.
One idea is using CreateProcess with CREATE_SUSPENDED flag - try to get a first chance. After that, use CreateRemoteThread to call LoadLibrary in the target process to load an injection DLL, and initialize that DLL. But it doesn't work because this will load all the implicit-linking DLLs of the executable program first. Maybe CreateRemoteThread triggers this behavior?
So, how do we get the first chance?