I'm trying to understand what each line does.
That would fall under the general category of learning assembly language. There are entire books written about this topic; some of them are probably even pretty good. You should purchase one. To ensure that you get maximum bang for your buck, be sure to select one that focuses on the architecture and operating system you're interested in. x86 assembly language is, of course, always the same, but the programming model differs enough between Windows and Linux that the differences would be confusing to a beginner.
If you're too cheap to buy a book, at least read Matt Pietrek's classic series of articles, "Just Enough Assembly To Get By", from the Microsoft System Journal. Start here, and proceed to the follow-up.
The first line is push ebp
. I know ebp
stands for base pointer. What is its function?
I see that in the second line the value in esp
is moved into ebp
and searching online I see that there first 2 instructions are very common at the beginning of an assembly program.
I'm new to assembly. Is ebp
used for stack frames, so when we have a function in our code and is it optional for a simple program?
To understand this first line in isolation, you just need to know what a PUSH
instruction does. It pushes the operand (in this case, a register) onto the top of the stack. EBP
is the register that almost always contains the stack base pointer.
That doesn't tell you much about the purpose of this code, though. This line and the next one are part of the standard function prologue. Matt talks about that near the beginning of his very first article, in the "Procedure Entry and Exit" section. First, the stack base pointer from EBP
is saved by PUSH
ing it onto the stack. Then, the second instruction copies the value of ESP
into the EBP
register. This makes interacting with the stack throughout the function easier. Generally, the prologue section would end with an instruction that reserved an arbitrary amount of space on the stack for temporary variables (e.g., sub esp, 8
to reserve 8 bytes on the stack). This function doesn't need any.
Yes, this prologue code is optional. If you don't need any stack space and/or you use EBP
-relative addressing, then you don't need the standard prologue. Optimizing compilers often omit it when possible.
Though are ebp
and esp
empty at the beginning?
No, of course they are not empty. If they were empty, the code wouldn't bother to save the value of EBP
or use the value of ESP
.
In fact, no registers are empty at the beginning of a function. They contain either the values that the function's prototype (in conjunction with its calling convention) says that they do, they contain values that you must preserve (that is, they must still have the same values when your function returns control that they did when your function was first called; these are called caller-save registers, and which ones they are differ depending on the calling convention), or they contain what you can assume to be garbage values (these are the callee-save registers and you are free to clobber them in the callee function's code).
Then push offset aHelloWorld; "Hello world\n"
The part after ;
is a comment so it doesn't get executed right? The first part instead adds the address containing the string Hello World to the stack, right? But where is the string declared? I'm not sure I understand.
aHelloWorld
is a piece of global data declared in the executable image. It was put there at link time, probably because the original code used a string literal. This instruction PUSH
es the offset
of that global data (that is, its address) onto the stack.
Yes, the part after the semicolon is a comma. The disassembler is adding this comment as a favor to you. It has looked up the value of aHelloWorld
, determined that it contains the string Hello world\n
, and placed that definition in-line, saving you from having to look up the data's value yourself.
Then call ds:__imp__printf
it seems it's a call to a function, anyway printf
is a builtin function right?
Yes, CALL
always calls a function. In this case, it is calling the printf
function. Is it a "built-in" function? That depends on your definition. From the perspective of assembly language, no: no function is built-in. printf
is a function provided by the C standard library. When the original code was compiled and linked, it was also linked with the C run-time library, which provides the C standard library functions, including printf
. Since this is MSVC, the __imp__
prefix is a big hint that the function being called is part of either the standard library or the Windows API. These are implicitly linked functions.
Looking up the printf
function shows that it takes a variable number of arguments. In the most common x86-32 calling conventions, these arguments are passed on the stack. So that explains why the previous instruction PUSH
ed the address of string data onto the stack: it's passing that address to the printf
function so that string can be printed to the standard output. It could have passed additional arguments to printf
, but it didn't, because it didn't need to: it just needed one to print a literal string.
And does ds
stand for data segment register? Is it used because we are trying to access a memory operand that isn't on the stack?
Yes, DS is the data segment. Your disassembler is just being verbose here. In Windows, x86-32 uses a flat memory model, so you can basically ignore the segment registers entirely and still understand everything that is going on perfectly well.
then add esp, 4
do we add 4 bytes to esp? Why?
Yes, this adds 4 bytes to the ESP
register. Why? To clean up the stack. Recall that before CALL
ing the printf
function, you PUSH
ed a 4-byte value (the offset of the string data in the executable image) on the stack. The printf
function is variadic (takes a variable number of arguments), so the caller is always responsible for cleaning up the stack after calling it.
Here, you can think of adding 4 to ESP
is equivalent to popping the stack with a POP
instruction. On x86, the stack always grows downwards, so adding is equivalent to popping (and the inverse of pushing).
then move eax, 1234h
what is 1234h here?
This instruction MOV
es the constant value 0x1234
(the h
means hexadecimal) into the EAX
register.
Why? Well, I can guess. In all of the x86 calling conventions, the EAX
register contains a function's return value. So it is very likely that the function's original code ended with return 0x1234;
.
then pop ebx
..it was pushed at the beginning. is it necessary to pop it at the end?
Actually, it pops EBP
, which is what was actually pushed at the beginning of the function.
And yes. Everything that you PUSH
onto the stack has to be POP
ed off the stack. (Or equivalent, as we saw earlier with ADD
ing to ESP
.) You have to clean up the stack. This is the function epilogue that corresponds to the prologue that we saw at the beginning. Refer back to Matt's article, where it talks about "Procedure Entry and Exit".
then retn
( i knew about ret
for returning a value after calling a function). I read that the n in retn refers to the number of pushed arguments by the caller.
This is just an idiosyncracy of your disassembler again. IDA Pro uses the retn
mnemonic. This actually means a near return, but since x86-32 uses a flat (non-segmented) memory model, the near vs. far distinction is not relevant. You can think of retn
as simply being equivalent to ret
.
Note that this is distinct from the ret
instruction that takes an argument, which is what you're thinking of. It doesn't "return" its argument, though. The function returns its result in the EAX
register. Rather, ret n
(where n
is 16-byte immediate value) returns and pops the specified number of bytes off the stack. This is used only for certain calling conventions (most commonly __stdcall
) where the callee is responsible for cleaning up the stack.
See links in the x86 tag wiki and Wikipedia for more information on calling conventions.
It isn't very clear for me.
Can you help me to understand?
Did I mention you should get a book that teaches assembly language programming?
printf
from dynamic library. So if you would somehow (by some attack) inject malicious dll during execution of this, with patched maliciousprintf
version, it may do lot of harm (at least in the context under which you run the hello world, unless that malicious code uses some other bug to escalate it's privileges and escape the current context, etc...). ... so much for the "no harm" ... :D – Ped7g