2
votes

At the end of the day every piece of code we write eventually gets turned into assembler and then machine language.

If you were writing assembler and wanting to perform a simple connection between two computers, how would you know which memory addresses to use (let alone offsets) within the assembler? Would you need to know specific addresses relating to the operating system?

I'm just wondering how somebody would write a really "clean" and "efficient" message passing library/compiler- the thing which is getting me is what on earth would network communications/IPC look like in assembler?

I think part of this answer could lie with querying known addresses relating to the OS? For example 0x4545456 to 0x 60000000 contains the Linux kernel data for communications X etc.

2
This is a very general question so I gave you a very long winded general answer. If you were more specific you'd get a more specific answer. Assembler doesn't automatically imply clean/efficient, it's more complicated than that.Guy Sirton

2 Answers

1
votes

The addresses are not specific to your OS. They are specific to your hardware/system. Accessing those has nothing to do with assembler vs. another programming language (e.g. C), in fact most device driver code (the code that actually interacts with the networking hardware) is typically written in C.

Here's just one random sample of a network (ethernet) controller:

Intel® 82580EB/82580DB GbE Controller: Datasheet

There are a bunch of registers that your software, either in assembler, or in another language, has to program to get this thing to actually communicate over ethernet. It's probably easier to start with a simpler example, something like a serial port. Let's build a hypothetical, fixed baud rate, serial port controller, mapped to memory:

Address  Meaning
0        RX status (reads 0 when no data to read, 1 a byte is available)
1        RX buffer
2        TX status (reads 0 when ready to send, 1 when busy)
3        TX buffer

Now your software, either in assembler or any other language, can transmit data to another computer, by monitoring (polling) address 2 until it's ready, writing the next byte to address 3. We can also received data from another computer by monitoring (polling) address 0 to see when data is ready and reading the byte from address 1 when the data is there.

In a modern operating system/OS those are all physical addresses which need to be somehow mapped into virtual addresses.

Real world hardware, such as the one I linked to, will typically use interrupts, so you don't need to poll. It will usually have DMA, so the hardware can access your data directly rather than you feeding it byte by byte. It will handle various protocols and will have registers for checking and setting various aspects of this protocol.

In a modern OS the actual interaction with the hardware is implemented in a device driver and user software can exchange data with the device driver through some API. Again, this user code may be written in assembler or any other language. The API will vary depending on the OS. Communication/networking is generally built as a "stack" with higher level protocols implemented over the lower level ones. Which part of this stack is in a user library or part of the OS will vary between different operating systems.

For the hypothetical device I described above the API may consist of two single byte blocking calls, read() and write(). You then use some sort of system call mechanism from either assembler or a higher level language to call these and pass parameters/retrieve the output. In some operating systems device I/O may look like file I/O so you would use the generic file read/write to perform operations on the device and the OS will dispatch those to the right device driver. Furthermore, in a typical OS the actual system call will be available through some sort of library, which again you may call from various programming languages.

1
votes

There are two pieces of code for doing networking in assembly - the kernel code used by the operating system to actually do the networking, and client code that wants to tell the OS what data to send over the network.

Typically, the hardware in a machine has certain memory addresses dedicated to communicating with the network hardware. The machine code for the OS can then write the appropriate values into this memory to control the hardware that ends up sending and receiving bytes. These memory addresses would be hardcoded into the machine code.

In the case of user code that does networking (say, Mozilla Firefox), the process is different. There is typically a machine instruction or set of instructions that are used for user code to tell the operating system to perform some task (in MIPS, for example, this is syscall, while I think x86 uses the int instruction). Client code would work by setting up some buffers with the appropriate data to send to the network, then would use one of the assembly instructions above to tell the OS that it should send the data. The hardware then invokes the OS, which reads the user data and then uses its own machine code (described above) to actually control the network device appropriately. In this way, the OS can guard direct access to the network device by blocking access to the physical addresses controlling the device and moderating access through system calls. It also means that you don't need to know any memory addresses when writing user code to do networking. The OS handles these details, and all you need to know about is what instruction to execute to trigger the system call.

Hope this helps!