I am reading a number of papers and they are either using store buffer and store queue interchangeably or they are relating to different structures, and I just cannot follow along. This is what I thought a store queue was:
- It is an associatively searchable FIFO queue that keeps information about store instructions in fetch order.
- It keeps store addresses and data.
- It keeps store instructions' data until the instructions become non-speculative, i.e. they reach retirement stage. Data of a store instruction is sent to the memory (L1 cache in this case) from the store queue only when it reaches retirement stage. This is important since we do not want speculative store data to be written to the memory, because it would mess with the in-order memory state, and we would not be able to fix the memory state in case of a misprediction.
- Upon a misprediction, information in the store queue corresponding to store instructions that were fetched after the misprediction instruction are removed.
- Load instructions send a read request to both L1 cache and the store queue. If data with the same address is found in the store queue, it is forwarded to the load instruction. Otherwise, data fetched from L1 is used.
I am not sure what a store buffer is, but I was thinking it was just some buffer space to keep data of retired store instructions waiting to be written to the memory (again, L1).
Now, here is why I am getting confused. In this paper, it is stated that "we propose the scalable store buffer [SSB], which places private/speculative values directly into the L1 cache, thereby eliminating the non-scalable associative search of conventional store buffers." I am thinking that the non-scalable associatively searchable conventional structure they are talking about is what I know as a store queue, because they also say that
SSB eliminates the non-scalable associative search of conventional store buffers by forwarding processor-visible/speculative values to loads directly from the L1 cache.
As I mentioned above, as far as I know data forwarding to loads is done through store queue. In the footnote on the first page, it is also stated that
We use "store queue" to refer to storage that holds stores’ values prior to retirement and "store buffer" to refer to storage containing retired store values prior to their release to memory.
This is in line with what I explained above, but then it conflicts with the 'store buffer' in the first quote. The footnote corresponds to one of the references in the paper. In that reference, they say
a store buffer is a mechanism that exists in many current processors to accomplish one or more of the following: store access ordering, latency hiding and data forwarding.
Again, I thought the mechanism accomplishing those is called a store queue. In the same paper they later say
non-blocking caches and buffering structures such as write buffers, store buffers, store queues, and load queues are typically employed.
So, they mention store buffer and store queue separately, but store queue is not mentioned again later. They say
the store buffer maintains the ordering of the stores and allows stores to be performed only after all previous instructions have been completed
and their store buffer model is the same as Mike Johnson's model. In Johnson's book (Superscalar Microprocessor Design), stores first go to store reservation station in fetch order. From there, they are sent to the address unit and from the address unit they are written into a "store buffer" along with their corresponding data. Load forwarding is handled through this store buffer. Once again, I thought this structure was called a store queue. In reference #2, authors also mention that
The Alpha 21264 microprocessor has a 32-entry speculative store buffer where a store remains until it is retired."
I looked at a paper about Alpha 21264, which states that
Stores first transfer their data across the data buses into the speculative store buffer. Store data remains in the speculative store buffer until the stores retire. Once they retire, the data is written into the data cache on idle cache cycles.
Also,
The internal memory system maintains a 32-entry load queue (LDQ) and a 32-entry store queue (STQ) that manages the references while they are in-flight. [...] Stores exit the STQ in fetch order after they retire and dump into the data cache. [...] The STQ CAM logic controls the speculative data buffer. It enables the bypass of speculative store data to loads when a younger load issues after an older store.
So, it sounds like in Alpha 21264 there is a store queue that keeps some information about store instructions in fetch order, but it does not keep data of store instructions. Store instructions' data are kept in the store buffer.
So, after all of this I am not sure what a store buffer is. Is it just an auxiliary structure for a store queue, or is it a completely different structure that stores data which is waiting to be written to L1. Or is it something else? I feel like some authors mean "store queue" when they say "store buffer". Any ideas?