3
votes

I am having trouble finding a good place to start learning assembly. I have found lots of conflicting information throughout the internet as to what assembly actually is, which assemblers to use, what an assembler is, and whether there is one "core" assembly language released by intel for their specific CPU families (I have an intel x86 CPU so that is what I wish to learn assembly for).

Could someone please explain the above-mentioned troubles. From what I have heard, Intel releases CPU families (x86, for instance) with an instruction set/reference, and the various Assembler programs (MASM, FASM, NASM, etc) provide a higher level human-readable language which is used to make machine code instructions.

Also, from what I heard, when someone says "assembly language", this actually refers to one of many different styles of assembly languages provided by the many different assemblers out there. http://en.wikipedia.org/wiki/X86_assembly_language#Examples MASM style assembly vs NASM style assembly

What I am looking for is "the first" assembler, without the variations that MASM, NASM, etc offer (such as the large libraries of macros). All these assemblers must have come from somewhere, and that is what I am looking for.

Basically, I am looking for the first x86 assembler/assembly language, before MASM, NASM etc. Could someone provide me with a link to this first assembler?

BTW, in case my entire logic about assembly is wrong, could someone clarify!

Thanks in advance,

Prgrmr

4
Nasm is probably the better choice for resembling the first x86 assembly language. To use the real/original tools would involve dosbox or some other dos emulator.old_timer
The original assembly language comes from the chip vendor amazon.com/Manual-Programmers-Hardware-Reference-240487-001/dp/… you should be able to see that it is incomplete, the gaps filled in by the various assemblers however they wanted.old_timer

4 Answers

7
votes

To be pedantic, the real language that you would use to talk to a CPU directly is machine code. This would mean figuring out the actual byte values that must be used for certain instructions. This is obviously far too tedious and error prone, so people use an assembler instead. An assembler translates a text representation of the machine code into the machine code itself, and takes care of the various fiddly details like calculating relative addresses etc.

For a particular machine code there can be a number of different assemblers, each with their own idea of how the assembly should be written. This is particularly true of x86 processors - broadly, there are two styles: Intel and AT&T. And then within those, different assemblers can have different sets of macros and directives and so on.

To illustrate, here is a sample of assembly generated from some C code with gcc -S -masm=intel:

    cmp     eax, ebx
    jl      .L63
    mov     eax, DWORD PTR inbuffd
    mov     DWORD PTR [esp+8], 8192
    mov     DWORD PTR [esp+4], OFFSET FLAT:inbuf
    mov     DWORD PTR [esp], eax
    call    read
    cmp     eax, -1
    mov     ebx, eax
    mov     DWORD PTR inbytes, eax
    je      .L64
    test    eax, eax
    je      .L36
    mov     eax, 1
    xor     edx, edx
    jmp     .L33

And here is the same snippet generated with gcc -S -masm=att:

    cmpl    %ebx, %eax
    jl      .L63
    movl    inbuffd, %eax
    movl    $8192, 8(%esp)
    movl    $inbuf, 4(%esp)
    movl    %eax, (%esp)
    call    read
    cmpl    $-1, %eax
    movl    %eax, %ebx
    movl    %eax, inbytes
    je      .L64
    testl   %eax, %eax
    je      .L36
    movl    $1, %eax
    xorl    %edx, %edx
    jmp     .L33

Those two snippets produce the same machine code - the difference is only in the assembly syntax. Note in particular how the order of arguments is different (Intel is destination-first, AT&T is source-first), the slight differences in instruction names, the use of % to specify registers in AT&T, and so on.

And then there are the different CPUs. A CPU has a certain architecture. That means it will execute the instruction set for that architecture. For that architecture there will be a core instruction set, and possibly extra groups of instructions for enhanced features or special applications. x86 is a fine example - You have the floating point instructions, MMx, 3DNow! and SSE 1 through 5. Different CPUs of that architecture may or may not be able to understand the extra instructions; generally there is some way to ask the CPU what it supports.

When you say "x86 assembly" what people understand you to mean is, "assembly that will run on any CPU of the x86 architecture".

More sophisticated CPUs - particularly those with memory management (x86 included) do more than simply execute instructions. Starting with the 80286, the x86 architecture has two main modes - real mode and protected mode. The core instruction set can be used as-is in either mode, but the way memory works in each mode is so completely different that it is impractical to try and write real world code that would work in either mode.

Later CPUs introduced more modes. The 386 introduced Virtual 8086 mode aka v86 mode to allow a protected mode operating system to run a real-mode program without having to actually switch the CPU to real mode. AMD64 processors run 64-bit code in long mode.

A CPU can support multiple architectures - the Itanium architecture is considered a separate architecture, and all of the CPUs released by Intel that support Itanium also support x86, with the ability to switch between them.

The x86 family is probably an overly complicated example of an assembly language - it has a terribly long and complex history going back 33+ years. The machine code for the core instructions used in (32-bit) applications is the same as for 8086 released in 1978. It has been through several revisions, each adding more instructions.

If you want to learn x86 assembly properly, consider:

  • The Art of Assembly Language Programming, and had an edition for each of DOS, Windows and Linux. The Windows and Linux versions use a language invented by the author called High Level Assembly or HLA, which is sort of like x86 assembly but not really. This may or may not be your cup of tea - it's not strictly real assembly but the concepts are all there, and learning to write proper assembly afterward would not be much effort. To its credit, it also contains a LOT of assembly related material, e.g. info on processor architecture, BIOS, video etc. The DOS version teaches straight MASM (Intel) assembly.

  • Programming from the Ground Up teaches AT&T style assembly in Linux

For actual assemblers (free ones), try MASM32 (intel style) on windows, oras on Linux. As it happens, Linux as will assemble either Intel or AT&T style assembly.

If you feel daunted by the x86 architecture and are willing to learn assembly for some other architecture, consider starting with something smaller.

3
votes

In addition to Michael Slade's excellent answer, here is some historical information:

The first x86 assembler was called "ASM86." It was produced by Intel and originally ran on their 8-bit "ISIS" operating system. A later version that runs under DOS has been preserved by WinWorld, an online software history museum. You can find it here. The accompanying manual archive includes Intel's 1985 reference manual for the ASM86 dialect. It supports familiar directives such as ASSUME, SEGMENT, DB/DW, END, and so on, as well as higher level macros.

The oldest x86 assembly language reference I've been able to find online is Intel's MCS-86 Macro Assembly Language Manual from 1979. A PDF copy has been preserved by BitSavers here.

One of the designers of the original ASM86, Eric Isaacson, went on to write A86, a kind of spiritual successor. The dialect of A86 is very similar to ASM86, but with a lot of the fussiness about ASSUME and SEGMENT directives and suchlike (Eric Isaacson refers to them as "red tape") relaxed or eliminated. A86 may be better than ASM86 at providing the spirit of bare metal assembly language that the OP seems to be looking for. A86 is 16-bit only; in order to run it you need a DOS emulator, or a machine running an older version of Windows (I have an old IBM Thinkpad x23 that still runs Windows XP; I have been running A86 in a DOS box on it without any problems).

Finally, there is a fascinating blog post about building the original IBM PC BIOS using ASM86 on the ISIS-2 platform at the OS/2 Museum.

0
votes

I don't think there is such a thing as the core assembler. Each of them has its own dialect. Also you probably need to consider which OS you want to write code for before you choose.

This seems to be a good article that may help choose wich one to start with: http://webster.cs.ucr.edu/AsmTools/WhichAsm.html

0
votes

it is hard to add to Michael Slades answer, but I do have a few comments.

Each processor vendor or creator of a processors machine code does so by using mnemonics, an assembly language for that processor. Typically that assembly defined in the original processor documentation, be it on a napkin at lunch or a very formal and pretty document is the "original" assembly language for that processor. The assembler (loose terms here as they can be understood differently, here used as the program that parses the assembly language and ideally makes machine code from it) is written to read that assembly language with additional items required to make the code properly as well as some directives, etc to make the programmers job easier (macros, equates (defines), etc).

Ideally if you are creating a new processor and you want to get any kind of acceptance you need at first an assembler and then later other languages (FORTRAN, BASIC, Pascal, C, on into the present)(C is always needed but obviously today you dont need Pascal or basic, etc). If the processor vendor wants to sell chips it needs to make or contract or encourage in some way an assembler at a minimum. With respect to the 8088/8086 Intel did have its own tools, but, they were pricey at the time and other tools were more popular (microsoft masm, msvc, borland tasm, pascal, tcc, bcc). There was a good free assembler called a86 if I remember right. Now we have nasm as an example of a good free assembler for x86.

Intel x86 is a bit of an exception more than the rule, there is a religious debate between the intel syntax, which is closer to the original and the AT&T syntax. gnu binutils tends to not honor the processor vendors (I would call use the word disrespect personally) by making changes, x86 is the worst as they have AT&T as the default, but also support intel with (some, maybe all) of their tools (other languages). Assemblers for a long time for example have used a semicolon ';' to mark the end of the line and anything after is a comment, for ARM certainly binutils considers that a new line, a fresh instruction and uses @ as the comment marker. Understood that it is individuals that make up the backends, sometimes these individuals are the chip vendors themselves, I get that, it is not one organized group doing these things, one person or group does the initial work the rest, if they accept it, take the working stuff and build upon it.

Like the comment symbol, over time assemblers for different processors have used similar or the same directives, additional tokens that are not machine code but for example ORG or .ORG indicate an address. Since you need at times to have the physical address where the machine code lives to encode the instruction, the user needs to in some way indicate that address, and back in the day when you were writing one asm program perhaps in a single file or a single file with includes and the output of the assembler was a complete binary instead of an incomplete object, you needed that address. This is why you dont see ORG statements in gnu assembler (gas), gnu assembler creates objects leaving the address specific instructions incomplete. both due to a need for an address and due to a need to link to resolve unknown labels. The linker is in part an assembler as it does the final steps of encoding those remaining instructions, it does not do that by taking assembly language ascii source code but uses data in the object file format.

x86 is absolutely the last assembly language I would recommend you learn. It is more of an interesting history lesson. the processors have evolved so much and changed at every step becoming microcoded very early (most processors ARE NOT microcoded, x86 due to its ugly assembly/machine language almost required it to compete).

Having an x86 is not a good reason to learn x86. You want to learn an instruction set where you have tools that can peer into the processor. Sure with a debugger you can single step, but having a simulator that you can manipulate to output anything, watch anything in any way you wish, or even better a logic simulator where you can see everything at once, is going to make the experience of learning assembly language far less painful. Less pain means you should enjoy it more and stick with it rather than give up. Although basic programming skills are required as with any language, assembly allows for you to get yourself into trouble quickly and easily. Also you dont want to be crashing your computer or anything like that. (here again if you get to where you feel you need to make system calls from asm, use something like pcemu, dosbox, later virtualbox, vmware, qemu to run a virtual machine which when crashed, causes you less pain.