Assembly Language to Machine Code

Question

I created a simple c++ source file with the following code.

int main() {
int a = 1;
int b = 2;
if(a < b) {
    return 1;
}
else if(a > b) {
    return 2;
}
else {
    return 3;
}

}

I used objdump command to get the assembly code for the above source code. And the line

int b = 2; gets converted to mov DWORD PTR [rbp-0x4],0x2.

And its corresponding machine code(Hex Format) is c7 45 fc 02 00 00 00.

I wanted to know how can I convert Assembly code to Binary code. I went through the Intel Reference manual for x86-64, But I was not able to understand, since I am new to low level programming.

What do you mean by 'convert'? Using a program? Doing it manually? — Shiro
int b = 2; is NOT Assembly language. The difference is, that C is compiled language, so the line int b = 2; may be implemented in many different ways (even removed completely by optimizer), depending on what compiler will decide, how to produce machine code which will produce results as defined by C language standard. Assembly language is different in a way, that Assembler is not compiler of this kind, when you write in Assembly add rax,rbx, it will be compiled as that, not changing the instruction, or removing by some kind of optimizer, so that's more like "1:1 transformation". — Ped7g

fuz fuz · Accepted Answer · 2017-05-24T15:12:18

You should read the Intel manuals, it explains how to do that. For a simpler reference, read this. The way x86 instructions are encoded is fairly straightforward, but the number of possibilities can be a bit overwhelming.

In a nutshell, an x86 instruction comprises the following parts, where every part except the opcode may be missing:

prefix opcode operands immediate

The prefix field may modify the behaviour of the instruction, which doesn't apply to your use case. You can look up the opcode in a reference (I like this one), for example, mov r/m32, imm32 is C7 /0 which means: The opcode is C7 and one of the two operands is zero as an extended operand. This instruction takes a 32 bit immediate, so the instruction has the form

C7 operand/0 imm32

The operand/extended opcode is encoded as a modr/m byte with an optional sib (scale index base) byte for some addressing modes and an optional 8 bit or 32 bit displacement. You can look up what value you need in the reference. So in your case, you want to encode a memory operand [rbp] with a one byte displacement and a register operand of 0, leading to the modr/m byte 45. So the encoding is:

C7 45 disp8 imm32

Now we encode the 8 bit displacement in two's complement. -4 corresponds to FC, so this is

C7 45 FC imm32

Lastly, we encode the 32 bit immediate, which you want to be 2. Note that it is in little endian:

C7 45 FC 02 00 00 00

And that's how the instruction is encoded.

Assembly Language to Machine Code

1 Answers