6
votes

I have to determine if the mars simulator is big or little endian as homework, this seems pretty straightforward at first, but I am having some issues.

First I tried storing 4 bytes in memory with .byte 0, 0, 0, 1, in memory this appears as 0x01000000, so, in reverse order, which seems to indicate that the simulator is little endian, however, when I load the 4 bytes as an integer to a register, what appears in the register is 0x01000000 again, as I understand if it was little endian what would be loaded is 0x00000001.

Also, when storing 4 bytes with .word 1, what is stored is 0x00000001, no bytes reversed this time.

I would like to know whether the simulator is big or little endian, and an explanation to this behaviour

2
what if you do word transactions and byte transactions at the same address or address plus one or two or three?old_timer
"in memory this appears as 0x01000000" - only when you view memory content as words, that view respects the endianness, and converts the four bytes 0, 0, 0, 1 to value 0x1000000 (little endian, first byte is *256^0, second is *256^1, ...). If you would check memory view as bytes, you would still see 00 00 00 01. The .byte directive will not rearrange bytes in any way, they are stored exactly as you wrote them.Ped7g
@Ped7g Thanks, that makes sense, but why, when I store a 1 with .word is the result different?, I believe I am storing exactly the same bytes in the same order, so shouldn't they appear as 0x01000000 as well?Juan González
By using directive .word you tell the assembler to treat the next text as integer value of 32 bit size, so it will parse the text like "1", convert it to integer value 1, and store that into binary in correct endianness to maintain that value, when you operate over that memory with word instructions. So the .word 0x11223344 defines four bytes 44 33 22 11. Then when you do lw $t0,(...), the t0 value will be 0x11223344, as you wrote in source. It would be crazy to write all values in source in "byte swapped" way, that's why .word exist (when .byte can be used for everything).Ped7g
And on big-endian target platform the assembler should compile .word 0x11223344 as four bytes 11 22 33 44. So the assembler (when targetting the correct platform) is hiding the endian-fuss from you, when you work with word values in source code (text). If you insist on defining particular bytes, and want to handle endianness yourself, the assembler allows you to do that by using byte-size directives like .byte or .space. Then you need to be aware of target platform and how to define word values by bytes correctly. (In your case the .word 1 will compile as 01 00 00 00, NOT 0,0,0,1)Ped7g

2 Answers

12
votes

There are several layers in your question involved, so I try to address them one by one...

Machine:

The machine has memory addressable by bytes. First byte has address 0, second has address 1, etc... Whenever I will write about content of memory in this answer, I will use this formatting: 01 02 0E 0F 10 ..., using hexadecimal values and using spaces between bytes, with addresses going continually from starting address toward ending address. I.e. if this content would start at address 0x800000, the memory would be (all hexa):

address | byte value
------- | ----------
800000  | 01
800001  | 02
800002  | 0E
800003  | 0F
800004  | 10
800005  | ...

So far it does not matter, whether the target MIPS platform is little or big endian, as long as byte-sized memory is involved, the order of bytes is "normal".

If you would load byte from address 0x800000 into t0 (with lb instruction), t0 will be equal to value 1.

If you would load word from address 0x800000 into t0 (with lw instruction), the endianness will come to play finally.

On little-endian machine the t0 will be equal to value 0x0F0E0201, the first byte of word (in memory) is amount of 2560 (the lowest power), second is amount of 2561, ... the last one is amount of 2563.

On big-endian machine the t0 will be equal to value 0x01020E0F, the first byte of word (in memory) is amount of 2563, second is amount of 2562, ... the last one is amount of 2560.

(256 is 28, and that magic number comes from "one byte is 8 bits", one bit can contain two values (0 or 1), and one byte has 8 bits, so one byte can contain 28 different values)

In both cases the CPU will read the same four bytes from memory (at addresses 0x800000 to 0x800003), but the endianness defines in which order they will appear as the final 32 bits of word value.

The t0 is physically formed by 32 bits on the CPU chip, it has no address. When you want to address it in CPU instruction (i.e. use value stored in t0), you encode it into instruction as $8 register ($8 has $t0 alias for convenience in your assembler, so I'm using that t0 alias rather).

The endianness does not apply to those 32 bits of register, they are already 32 bits b0-b31, and once the value 0x0F0E0201 is loaded, those 32 bits are set to 0000 1111 0000 1110 ... (I'm writing it from top b31 bit down to bottom b0, to make sense of shift left/right instructions and also to make it work as human formatted binary number), there's no point to think about endianness of register or in which physical order the bits are stored on the chip, it's enough to think about it as full 32 bit value and in arithmetic instructions it will work as that.

When loading byte value with lb into register, it lands into b0-b7 bits with b8-b31 containing copy of b7 (sign-extending the signed 8 bit value into signed 32 bit value).

When storing value of register into memory, the endianness again does apply, i.e. storing word value 0x11223344 into memory will set up individual bytes as 44 33 22 11.

Assembler (source code and compilation)

A well configured assembler for it's target platform will hide the endianness from programmer, to make usage of word values convenient.

So when you define memory value like:

myPreciousValue .word 0x11223344

The assembler will parse text (your source code is text (!), i.e. one character is one byte value - in ASCII encoding, if you write the source in UTF8 text editor and use non-ASCII characters, they may be encoded across multiple bytes, the ASCII printable characters have the same encoding in both ASCII and UTF8, and occupy single byte only) "0x11223344" (10 bytes 30 78 31 31 32 32 33 33 34 34), calculate 32 bit word value 0x11223344 out of it, and then it will apply target-platform endianness to that to produce four bytes of machine code, either:

44 33 22 11           # little-endian target

or:

11 22 33 44           # big-endian target

When you then use the lw instruction in your code, to load myPreciousValue from memory into register, the register will contain the expected word value 0x11223344 (as long as you didn't mix up your assembler configuration and used the wrong endianness, can't happen in MARS/SPIM, as that supports only little-endian configuration in everything (VM, assembler, debugger)).

So the programmer does not have to think about endianness every time he writes the 32 bit value somewhere in the source, the assembler will parse and process it to the target variant of byte values.

If the programmer wants to define four bytes 01 02 03 04 in memory, she can be "smart" and use .word 0x04030201 for little-endian target platform, but that's obfuscating the original intent, so I suggest to use .byte 1, 2, 3, 4 in such case, as the intent of programmer was to define bytes, not word.

When you declare byte values with .byte directive, they are compiled in the order how you write them, no endianness is applied to that.

Debugger

And finally memory/register view of debugger... this tool again will try hard to work in intuitive/convenient way, so when you check memory view, and have it configured to bytes, the memory will be shown as:

0x800000: 31 32 33 34 41 42 43 44 | 1234ABCD

When you switch it to "word" view, it will use the configured endianness to concatenate bytes in the target platform order, i.e. in MARS/SPIM as little-endian platform it will show on the same memory:

0x800000: 34333231 44434241

(if the ASCII view is also included, is it "worded" too? If yes, then it will look as 4321 DCBA. I don't have at the moment MARS/SPIM installed to check what they memory view in debugger actually looks like, sorry)

So you as programmer can read the "word" value directly from display, without shuffling the bytes into "correct" order, you already see what the "word" value will be (from those four bytes of memory content).

The register view usually by default shows hexadecimal word values, i.e. after loading word from that address 0x800000 into t0, the register $8 will contain value 0x34333231 (875770417 in decimal).

If you are interested what was the value of first byte in memory used for that load, at this point you have to apply your knowledge of endianness of that target platform, and look either at the first two digits "34" (big endian), or last two "31" (little endian) in the register view (or rather use the memory view in byte-view mode to avoid any mistake).

Runtime detection in code.

So with all that information above, the runtime detection code should be easy to understand (unfortunately I don't have MARS/SPIM at the moment, so I didn't verify it works, let me know):

.data

checkEndianness: .word 0    # temporary memory for function
    # can be avoided by using stack memory instead (in function)

.text

main:
    jal  IsLittleEndian
    # ... do something with v0 value ...
    ... exit (TODO by reader)

# --- functions ---

# returns (in v0) 0 for big-endian machine, and 1 for little-endian
IsLittleEndian:
    # set value of register to 1
    li $v0,1
    # store the word value 1 into memory (4 bytes written)
    sw $v0,(checkEndianness)
      # memory contains "01 00 00 00" on little-endian machine
      #              or "00 00 00 01" on big-endian machine
    # load only the first byte back
    lb $v0,(checkEndianness)
    jr $ra

What is it good for? As long as you write your software for the single target platform, and you are storing/loading words by the target CPU, you don't need to care about endianness.

But if you have software which is multi-platform, and it does save binary files... To make the files work in the same way on both big/little endian platforms, the specification of file structure must specify also endianness of the file data. And then according to that specs, one type of target platforms may read it as "native" word values, the other one will have to shuffle the bytes in word values to read correct word value (plus the specs should also specify how many bytes "word" is :) ). Then such runtime test may be handy, if you will include the shuffler into save/load routines, using the endianness detection routine to decide whether it has to shuffle the word bytes or not. That will make the target platform endianness "transparent" to the remaining code, which will simply send to save/load routine it's native "word" values, and your save/load may use the same source on every platform (at least as long as you use some multi-platform portable programming language like C, of course the assembly for MIPS will not work on different CPUs at all, and would need to be rewritten from scratch).

Also the network communication is often done with custom binary protocols (wrapped usually in the most common TCP/IP packets for the network layer, or even encrypted, but your application will extract the raw bytes content out of it at one point), and then endianness of sent/received data matters, and the "other" platforms have to shuffle the bytes to read correct values then.

Other platforms (non-MIPS)

Can apply pretty much everything from above, just check what is byte and word on the other platform (I think byte is pretty set in stone as 8 bits for last 35+ years, but word may differ, for example on x86 platforms word is 16 bit only). Still little-endian machine will read "word" bytes in "reversed" order, the first byte used as amount of the smallest 2560 power and last byte used as amount of the highest 256 power (2561 on x86 platform, as only two bytes form word there, the MIPS "word" is called "double word" or "dword" in x86 world).

6
votes

This is from the site: http://courses.missouristate.edu/KenVollmar/mars/Help/MarsHelpDebugging.html

Memory addresses and values, and register values, can be viewed in either
decimal or hexadecimal format. All data are stored in little-endian
byte order (each word consists of byte 3 followed by byte 2 then 1 then 0).
Note that each word can hold 4 characters of a string and those 4 
characters will appear in the reverse order from that of the string literal

As you can see it is little-endian