6
votes

So, I am confused about how jump instructions work in an operating system. I thought that the jump instruction set the value in the processor's program counter. But programs can be run in various locations in memory. I see that in x86, there's the JMP EAX instruction, but my C++ code doesn't seem to use this. I compiled some C++ code in VC++:

int main()
{
    int i = 0;
    while (i < 10)
    {
        ++i;
        if (i == 7)
        {
            i += 1;
            continue;
        }
    }
}

This translates to:

    int main()
    {
00411370  push        ebp  
00411371  mov         ebp,esp 
00411373  sub         esp,0CCh 
00411379  push        ebx  
0041137A  push        esi  
0041137B  push        edi  
0041137C  lea         edi,[ebp-0CCh] 
00411382  mov         ecx,33h 
00411387  mov         eax,0CCCCCCCCh 
0041138C  rep stos    dword ptr es:[edi] 
        int i = 0;
0041138E  mov         dword ptr [i],0 
        while (i < 10)
00411395  cmp         dword ptr [i],0Ah 
00411399  jge         main+47h (4113B7h) 
        {
            ++i;
0041139B  mov         eax,dword ptr [i] 
0041139E  add         eax,1 
004113A1  mov         dword ptr [i],eax 
            if (i == 7)
004113A4  cmp         dword ptr [i],7 
004113A8  jne         main+45h (4113B5h) 
            {
                i += 1;
004113AA  mov         eax,dword ptr [i] 
004113AD  add         eax,1 
004113B0  mov         dword ptr [i],eax 
                continue;
004113B3  jmp         main+25h (411395h) 
            }
        }
004113B5  jmp         main+25h (411395h) 
    }
004113B7  xor         eax,eax 
004113B9  pop         edi  
004113BA  pop         esi  
004113BB  pop         ebx  
004113BC  mov         esp,ebp 
004113BE  pop         ebp  
004113BF  ret              

So I'm confused, for the command jmp 411395h, does this imply the program is always loaded in the same spot in memory? Because that seems illogical.

7
Bear in mind that modern CPUs tend to support virtual memory, meaning each program has its own address space. That is, the byte at 0x12345678 in one process can be a different point in real memory than the byte at 0x12345678 in another process.Joey Adams

7 Answers

6
votes

No, there are two things possibly at play here - you don't specify an OS so I'm going to give a general answer.

The first is that an executable file is rarely in the final format. As a simplification, compilation turns source into object files and linking combines object files into an executable.

But the executable has to be loaded into memory and, at that stage, there can be even more modifications done. One of these modifications may be to fix up memory references within the executable to point to memory that has been loaded at different locations.

This can be acheived by the executable file containing a list of addresses within itself that need to be fixed up at run time.

There is also a disconnect between virtual memory and physical memory in many modern operating systems.

When your process starts, you get your own (4G for Windows 32bit, I believe) address space into which your process is loaded. The addresses within this address space have little relationship to your actual physical memory addresses and the translation between the two is done by a memory management unit (MMU).

In fact, your process could be flying all over the physical address space as it's paged out and in. The virtual addresses will not change however.

6
votes

As other people wrote, there are relative jump and relative call instructions which essentially add a fixed value to eip and therefore do not depend on the program's location in memory; compilers prefer to use these whenever possible. You can look at the code bytes to see what exact instructions your compiler used. However, I assume you are asking about jumps/calls to absolute addresses.

When the linker generates an executable, it generates absolute addresses supposing a particular base address; Microsoft linker usually uses 400000h. When OS loads an executable or a dll, it "fixes up" all absolute addresses by adding the difference between the address at which the executable was actually loaded and the address at which the linker based it. All executable formats except .com specify some sort of fixup table, which lists all locations in the executable which have to be patched up in this way. Therefore, after the OS loads your executable into memory at base address, say, 1500000h, your jump will look like jmp 1511395h. You can check this by looking at actual code bytes with a debugger.

Older Windows systems preferred to load executables at the base address used by the linker; this created a security risk, because an attacker would know in advance what is where in memory. This is why newer systems use base address randomization.

3
votes

No. On x86 (and other architectures, too), most jump instructions are IP-relative: the binary machine codes for the instructions represent an offset from the current instruction pointer. So, no matter what virtual address the code gets loaded at, the jump instructions function correctly.

3
votes

The memory locations are relative to the process. main is always at the same spot in memory, relative to the beginning of the program.

3
votes

Relative jumps take the address of the current machine instruction (called instruction pointer) and add an offset to compute the address to be jumped to.

If you look at your code

004113B3  jmp         main+25h (411395h) 
004113B5  jmp         main+25h (411395h) 
004113B7  xor         eax,eax 

you'll note that the jmp instruction is 2 bytes long (1 byte for jmp, 1 byte for offset), and cannot possibly store an absolute 4-byte address.

Relative jumps are basic functionality of CPUs (from what I know about 65xx, Z80, 8086, 68000), and are not related to such advanced features as virtual memory, memory mapping or address space randomization.

2
votes

Most chips have relative jumps (relative to the current location) and virtual addressing.

0
votes
int main()
    {
00411370  push        ebp  
00411371  mov         ebp,esp 
00411373  sub         esp,0CCh 
00411379  push        ebx  
0041137A  push        esi  
0041137B  push        edi  
0041137C  lea         edi,[ebp-0CCh] 
00411382  mov         ecx,33h 
00411387  mov         eax,0CCCCCCCCh 
0041138C  rep stos    dword ptr es:[edi] 
        int i = 0,int j=0;
0041138E  mov         dword ptr [i][j],0
        while (i < 10)
00411395  cmp         dword ptr [i][j[,0Bh 
00411399  jge         main+47h (4113B7h) 
        {
            ++i;
0041139B  mov         eax,dword ptr [i][j] 
0041139E  add         eax,1 
004113A1  mov         dword ptr [i][j],eax '
            if (i == 7)
004113A4  cmp         dword ptr [i][j],7 
004113A8  jne         main+45h (4113B5h) 
            {
                i += 1;
004113AA  mov         eax,ebx,dword ptr [i][j] 
004113AD  add         eax,1 
004113B0  mov         dword ptr [i][j],ebx 
                continue;
004113B3  jmp         main+25h (411395h) 
            }
        }
004113B5  jmp         main+25h (411395h) 
    }
004113B7  xor         eax,ebx 
004113B9  pop         edi  
004113BA  pop         esi  
004113BB  pop         ecx  
004113BC  mov         esp,ebp 
004113BE  pop         ebp  
004113BF  ret