How many bytes the cache controller fetches a time from main memory to L2 cache?

Question

I just read two articles over this topic which provide infomration inconsistent, so I want to know which one is correct. Perhaps both are correct, but under what context?

The first one states that we fetch a page size a time

The cache controller is always observing the memory positions being loaded and loading data from several memory positions after the memory position that has just been read.

To give you a real example, if the CPU loaded data stored in the address 1,000, the cache controller will load data from ”n“ addresses after the address 1,000. This number ”n“ is called page; if a given processor is working with 4 KB pages (which is a typical value), it will load data from 4,096 addresses below the current memory position being load (address 1,000 in our example). In following Figure, we illustrate this example.

enter image description here

The second one states that we fetch sizeof(cache line) + sizeof(prefetcher) a time

So we can summarize how the memory cache works as:

The CPU asks for instruction/data stored in address “a”.

Since the contents from address “a” aren’t inside the memory cache, the CPU has to fetch it directly from RAM.

The cache controller loads a line (typically 64 bytes) starting at address “a” into the memory cache. This is more data than the CPU requested, so if the program continues to run sequentially (i.e. asks for address a+1) the next instruction/data the CPU will ask will be already loaded in the memory cache.

A circuit called prefetcher loads more data located after this line, i.e. starts loading the contents from address a+64 on into the cache. To give you a real example, Pentium 4 CPUs have a 256-byte prefetcher, so it loads the next 256 bytes after the line already loaded into the cache.

It cannot be a full page. Current processors can support pages up to 4 MB in 32-bit mode and 1 GB in 64-bit mode. — ughoavgfhw

Crashworks Crashworks · Accepted Answer · 2012-03-22T20:49:45

Completely hardware implementation dependent. Some implementations load a single line from main memory at a time — and cache line sizes vary a lot between different processors. I've seen line sizes from 64 bytes all the way up to 256 bytes. Basically what the size of a "cache line" means is that when the CPU requests memory from main RAM, it does so n bytes at a time. So if n is 64 bytes, and you load a 4-byte integer at 0x1004, the MMU will actually send 64 bytes across the bus, all the addresses from 0x1000 to 0x1040. This entire chunk of data will be stored in the data cache as one line.

Some MMUs can fetch multiple cache lines across the bus per request -- so that making a request at address 0x1000 on a machine that has 64 byte caches actually loads four lines from 0x1000 to 0x1100. Some systems let you do this explicitly with special cache prefetch or DMA opcodes.

The article through your first link, however, is completely wrong. It confuses the size of an OS memory page with a hardware cache line. These are totally different concepts. The first is the minimum size of virtual address space the OS will allocate at once. The latter is a detail of how the CPU talks to main RAM.

They resemble each other only in the sense that when the OS runs low on physical memory, it will page some not-recently-used virtual memory to disk; then later on, when you use that memory again, the OS loads that whole page from disk back into physical RAM. This is analogous (but not related) to the way that the CPU loads bytes from RAM, which is why the author of "Hardware Secrets" was confused.

A good place to learn all about computer memory and why caches work the way they do is Ulrich Drepper's paper, What Every Programmer Should Know About Memory.

How many bytes the cache controller fetches a time from main memory to L2 cache?

1 Answers