2
votes

I'm reading about word size and it states the following:

Every computer has a word size, indicating the nominal size of a pointer data. Since a virtual address is encoded by such a word, the most important system parameter determined by the word size is the maximum size of the virtual address space. That is, for a machine with a w-bit word size, the virtual addresses can range from 0 to 2^w-1

A 32-bit word size limits the virtual address space to 4 gigabytes, that is, just over 4 x 10^9 bytes.

Is my understanding correct that it's the number of possible addresses, not the total memory size? For example, if a word is 2 bits, then I'd have 4 addresses (2^2), however, the contents placed in these addresses may have much bigger size. I can access a struct of total 1GB, and place it under address[0], so that even though the addresses are limited to 2 bits, the memory is over 1GB.

2
Pointers are free to be represented in any way, but they’re usually byte offsets, and so limit the memory itself.Ry-
@Ap31, it's this bookMax Koretskyi

2 Answers

3
votes

Inside a computer there is an address bus and a data bus. The address bus and data bus are not the same bus - for example you can have a 16 bit address bus, and a 32 bit data bus. The number of possible addresses would be 65536, and total memory size would be 262144 bytes (65536 words, = 65536*4) if it were a word-addressed machine, or 65536 bytes if it were a byte-addressed machine.

Most CPUs these days are byte-addressed, some old mainframes and supercomputers (e.g. the old Crays) were word addressed. On word-addressed machines there was no real concept of a "byte" anyway, so the memory size was quoted in words.

In effect the byte has become the accepted as the smallest unit of memory that is useful; not too small, not too big. For example, a bit-addressed machine would need a memory bus 8 times wider for the same sized memory (very inconvenient), and 64 bit words would make a lot of general purpose computing wasteful.

Everything is Word Addressed Really

The irony here is that the design of caches and DDR memory means that when your program addresses a single byte, a whole cache line's worth of data is loaded from DDR over the data bus. DDR4's data bus is 64 bit wide, so at a minimum 64 bits is loaded when your program accesses a single byte. So your program "thinks" it is living in a byte addressed environment, but the physical reality of the memory interface is that it is 64bit word addressed.

3
votes

Your book is not entirely truthful, but as a general simple rule it holds fairly well.

The common exceptions today are that you can have a word-size that is bigger than the address range, such as in most 64-bit CPU's (all I know enough details to understand the address range of) - x86-64 for example, have a 48 bit virtual address range, which technically can be expanded to 53 bits I believe. To go further than that would require a new design of the page-table layout in the machine, so not a trivial change at all. Addresses that are outside of the 48 valid bits (which is actually 47 bits and then the 48th top (or sign) bit expanded to the remaining 16 bits, ensuring that nobody uses the top 16 bits for "clever stuff" that causes a break in the architecture if/when the address range is expanded). AArch64 (ARM's 64-bit processor architecture) also uses only part of the 64 bits for virtual addresses - I think it is 48 bits here too.

On the other hand, 25+ years ago, when 16-bit x86 computers were the standard, they could address more than the 16 bits of a "word" thanks to segment registers that are shifted 4 bits and then added to the 16-bit register value, this allows for 20 bits of address range [1] - or in the 80286 processor in protected mode, the segment register contains an index into a segment descriptor table, which has a base-address, that is added to the regular register value, giving a 24-bit address in total.

And of course, many processors have a more restrictive physical address range than that of the virtual address, because it's "expensive" to put pins for addressing memory onto the outside of the processor. 68000 is a 32-bit processor, but only 24 address pins. Early x86-64 processors have only 40 bits of address pins, giving 64GB of memory range.

[1] Actually a tiny bit over, because you can get the addition of a 20-bit value with the low four bits zero added to the 16-bit value.