How does a processor fetch cache lines?

Question

When a processor pre-fetches a cache-line of data, does it pre-fetch from that address up to the number of bytes or does it pre-fetch from that address up to half the cache line and back wards up to half the cache line?

For example assume cache line is 4 bytes and pre-fetching from address 0x06. Will it fetch bytes at 0x06 0x07 0x08 0x09 or will it pre-fetch from addresses 0x04 0x05 0x06 0x07.

I need this info for a program which I am writing and need to optimize.

I definitely think that this is highly implementation dependant. Maybe adding the specifics would help... — ppeterka
A cache line is something like 64 bytes, and it starts at the address with the lowest six bits all zero. You find the cache line of an address by masking out its lowest six bits. (Or whatever power-of-two size your cache line has.) — Kerrek SB
@KerrekSB your comment just answered my question. Okay, so as in the the example I gave, assuming the cache line is 4 bytes and I'm fetching at address 0x06 what I'll get in the cache will be bytes at 0x04, 0x05, 0x06 and 0x07. The next cache line would then be at 0x08. So let's say I want to get the byte at 0x0A, I would then have 0x08, 0x09, 0x0A, 0x0B pre-fetched into the cache! — d2alphame
See this question for finding the actual size. Looks like it's 32 bytes on old Intels and 64 on contemporary ones. — Kerrek SB

nos nos · Accepted Answer · 2013-09-11T11:54:56

According to this (which is naturally Intel specific)

"The cache line size is 32 bytes, or 256 bits. A cache line is filled by a burst of four reads on the processor’s 64-bit data bus."

This means 8 bytes are fetched in parallel from main memory, within these 8 bytes there's no first or last, they arrive simultaneously, as the bytes are fetched over a 64 bit wide bus.

As it takes 4 reads to fill a cache line, Intel seems to not specify the order of these 4 reads - which mean you're left with some choices, e.g.

assume that there is no specific order
assume the address are fetched from lowest to highest, or vice versa.

The first assumption is of course the safest - since the order is as far as I can find undocumented(so it could depend on the model, or other factors)

How does a processor fetch cache lines?

2 Answers