12
votes

I'm very curious how assembly languages work- I remain general because I'm not talking only about intel x86 assembly (although it's the only one I'm remotely familiar with). To be a bit more clear...

mov %eax,%ebx

How does the computer know what an instruction like "mov" does? How does it know that eax and ebx are registers? Do people write grammars for assembly languages? How do they write this? I imagine nothing is stopping someone from writing an assembly language that substitutes the mov instruction with something like dog or horse etc., (obviously this isn't semantic at all)

Sorry if this isn't too clear, but it's something I find a bit puzzling- I know it can't be magic, but I can't see how it works. I've looked up some stuff on wikipedia, but all it seems to say is it translates it down to machine code, well, what I'm asking is how that translation occurs I suppose.

Thoughts?

EDIT: I realize that this stuff is defined in reference manuals and things, I guess what I wish to know is how you tell your processor "Okay, when you see mov you're gonna do this". I also know that it's a sequence of probably a ton of logic gates..but there has to be some way for the processor to recognize is that mov is the symbol that means "use these logic gates"

5
You are asking too broad of a question. Typically assembly statements are packed into instruction word "stream". Exactly how the CPU decodes them is beyond a simple answer.Anycorn

5 Answers

21
votes

Computers are basically built out of logic gates. Though this is an abstract idealization of the real physical machinery, it is close enough to the truth that we can believe it for now. At a very basic level, these things work just like true/false predicates. Or if you've ever played minecraft, it works a lot like redstone. The field which studies how to put together logic gates to make interesting complex circuits, like computers, is called computer architecture. It is traditionally viewed as a mixture of computer science and electrical engineering.

The most basic logic gates are things like AND, and OR which just take bits together and smash out some boolean operation between them. By creating feed back loops in logic gates you can store memory. One type of standard memory circuit is called a flip-flop, and it is basically a little loop of wire together with some AND gates and power to keep it stable. Putting together multiple latches lets you create bit vectors, and these things are called registers (which are what things like eax and ebx represent). There are also many other types of parts, like adders, multiplexors and so on which implement various pieces of boolean logic. Here is a directory of some circuits:

http://www.labri.fr/perso/strandh/Teaching/AMP/Common/Strandh-Tutorial/Dir.html

Your CPU is basically a bunch of these things stuck together, all built out of the same basic logic gates. The way that your computer knows how to keep on executing instructions is that there is a special piece of machinery called a clock which emits pulses at regular intervals. When your CPU's clock emits a pulse it sets off a sequence of reactions in these logic gates that causes the CPU to execute an instruction. For example, when it reads an instruction that says "mov eax, ebx", what ends up happening is that the state of one of these registers (ebx) gets copied over to the state of another (eax) just in time before the next pulse of comes out of the clock.

Of course this is a gross oversimplification, but as a high level picture it is essentially correct. The rest of the details take awhile to explain, and there are a few things here that I neglected due to unnecessary subtlety (for example, in a real CPU sometimes multiple instructions get executed in a single clock; and due to register paging sometimes eax isn't always the same thing; and sometimes due to reordering occasionally the way that instructions get executed gets moved around, and so on). However, it is definitely worth learning the whole story since it is actually quite amazing (or at least I like to think so!) You would be doing yourself a great favor to go out and read up on this stuff, and maybe try building a few circuits of your own (either using real hardware, a simulator, or even minecraft!)

Anyway, hope that answers a bit of your question about what mov eax, ebx does.

9
votes

What you see there are mnemonics, which make it easy for a programmer to write assembly; it is however not executable in mnemonic form. When you pass these assembly instructions through an assembler, they are translated into machine code they represent, which is what the CPU and its various co-processors interpret and execute (it's generally taken down into smaller units by the CPU, called micro-ops).

If you're curious as to how exactly it does that, well that's a long process, but this has all that information.

All the semantics, etc. are handled by the assembler, which checks for validity and integrity where possible (one can still assemble invalid code however!). This basically makes assembly a low-level language, even though it has a 1 to 1 correlation to the outputted machine code (except when using macro based assemblers, but then the macros still expand to 1 to 1).

7
votes

Your CPU doesn’t execute assembly. The assembler converts it into machine code. This process depends on both the particular assembly language and the target computer architecture. Generally those go hand in hand, but you might find different flavors of assembly language (nasm vs. AT&T, for example), which all translate into similar machine code.

A typical (MIPS) assembly instruction such as “And immediate”

andi $t, $s, imm

would become the 32-bit machine code word

0011 00ss ssst tttt iiii iiii iiii iiii

where s and t are numbers from 0–31 which name registers, and i is a 16-bit value. It’s this bit pattern that the CPU actually executes. The 001100 in the beginning is the opcode corresponding to the andi instruction, and the bit pattern that follows — 5-bit source register, 5-bit target register, 16-bit literal — varies depending on the instruction. When this instruction is placed into the CPU, it responds appropriately by decoding the opcode, selecting the registers to be read and written, and configuring the ALU to perform the necessary arithmetic.

3
votes

The instructions in assembly code map to the actual instruction set and register names for the CPU architecture you're targeting. mov is an X86 instruction, and eax and others are the names of (in this case general purpose) registers defined it the Intel x86 reference manual.

Same thing for other architectures - the assembly code maps quite directly to the actual names of the operations as defined in the chip's specifications/documentation.

That mapping is way more simple than for instance compiling C code.

1
votes

first thing every instruction like mov ,add etc have own meaning in binary form like 10101010, 00110000, 10100 some of these also be, which understands cpu always.

but human cant remember all of them. so... for programming purpose that used in english language. which is ultimately will come to its own place(binary).

second thing conversion from english(mov, add etc.) to binary occurs at, when assembling or compiling them code. after that- binary instructions(instruction sets) stored in ram and ready for execution.

but it may be not your answer i know.

if you want know and imagine perfectly- how does cpu exucute instructions and work on them. You can learn it with graphics here. see this video on youtube: (link given here)

https://m.youtube.com/watch?v=cNN_tTXABUA&itct=CCUQpDAYAyITCOHa_9e_q80CFZ1Vvgodek8KmzILYzQtb3ZlcnZpZXdaGFVDNmVhVm43MzQ5TFJoNXl6cFhqZXU4QQ%3D%3D&client=mv-google&gl=IN&hl=en-GB

https://m.youtube.com/watch?v=NKYgZH7SBjk&itct=CBoQpDAYAiITCOHa_9e_q80CFZ1Vvgodek8KmzILYzQtb3ZlcnZpZXdaGFVDNmVhVm43MzQ5TFJoNXl6cFhqZXU4QQ%3D%3D&client=mv-google&gl=IN&hl=en-GB

watch it once and i promise you. you will more clear about it. have a look just right.