6
votes

Here's the setup:

My coworker has a Fedora x64_86 machine with a gcc 4.3.3 cross compiler (from buildroot). I have an Ubuntu 9.04 x64_86 machine with the same cross compiler.

My coworker built an a library + test app that works on a test machine, I compiled the same library and testapp and it crashes on the same test machine.

As far as I can tell, gcc built against buildroot-compiled ucLibc, so, same code, same compiler. What kinds of host machine differences would impact cross compiling?

Any insight appreciated.

Update: To clarify, the compilers are identical. The source code for the library and testapp is identical. The only difference is that the testapp + lib have been compiled on different machines..

6
It's hard to know without being able to inspect the machines or the code. Can you tell us anything about the nature of the crash? What type of crash is it? Is it taking place in your code or in a library that it depends on?Evan Shaw
If everything is identical, how different are the output files?Kris Kumler
How did it go, did you find the problem?Johan
Unfortunately, I did not find what was causing the crash. Something in the environment, or the bug in the source. I will never know :(EightyEight

6 Answers

7
votes

If your code crashes (I assume you get a sigsegv), there seems to be a bug. It's most likely some kind of undefined behaviour, like using a dangling pointer or writing over a buffer boundary.

The unfortunate point of undefined behaviour is, that it may work on some machines. I think you are experiencing such an event here. Try to find the bug and you'll know what happens :-)

3
votes

In what way does it crash? Can you be more specific, provide output, return codes, etc... Have you tried plugging in some useful printf()'s?

And, I think we need a few more details here:

  1. Does the testapp link to the library?

  2. Is the library static or dynamic?

  3. Is the library in the library search path, or have you added its directory to ld.so.conf?

  4. Are you following any installation procedures for the library and testapp?

  5. Are the two libraries and testapps bit-for-bit compatible? Do you expect them to be?

  6. Are you running as the same user as your coworker, with same environment and permissions?

3
votes

Obviously, something isn't identical.

Try using objdump and its many options, especially -d, to determine what is different.

You didn't make a point of it, so I am going to guess binutils is the difference. That is the set of tools used in building binaries. It includes ld, as and objdump.

Cross-compilers need their own set of binutils for the target architecture. However, unlike GCC I do not believe the binutils tools do a double bootstrap build and verify step, so it is possible that some difference from the original x86_64 build environment made it into them.

I'd try building the binutils packages for ARM again, using the ARM crosscompiler. See if that makes a difference.

It's something I have seen in regular x86 Gentoo stage1 installs too: after getting the bootstrap system and compilers installed and updated, a Gentoo user is well-recommended to rebuild system again using the updated tools.

1
votes

What arch is your target (the test machine)?

Are you using the distribution provided compilers? They usually have a quite large set of patches applied to gcc, for example on gentoo there are about 20 patches, fedora and ubuntu won't be that different. Not all patches are 100% fine, though :-( So the compilers may in reality differ.

You may look for a "vanilla" version of gcc on your distribution, maybe it does the trick.

1
votes

I knew someone who had a similar experience in college. Basically, in a lab of identical machines, his project worked on his development box, but crashed horribly on the professors box. These were two machines which were the same arch, running the same version of the OS.

It boiled down to an uninitialized pointer somewhere.

He had code which looked like:

if(p == NULL) {
    p = f();
}

Since p was a member of a class which was allocated on the heap, it's value was effectively random and occasionally was in fact NULL, making thing works OK... The problem was that sometimes and on some machines, the memory for p was NULL on program startup, but on the prof's box, it was not. The fix was of course to properly initialize p tp NULL and all was well.

You may be experiencing something like this. Or some type of undefined behavior which is a fancy way of saying "it may or may not work as expected for any or no reason at all"

1
votes

As a stab in the dark, I'd look for uninitialized variables. Make sure all local and global variables are assigned a value. Double check that constructors have initializers for ALL data members.