I am a newbie to R. Assume the memory layout is the same for data frame and matrix.
In the following matrix
a=matrix(1:10000000,1000000,10)
it has 1M rows and 10 columns. Is the memory for row or for column sequential physically? Or is the physical memory first store [1,1],[2,1],[3,1],,[1M,1],[2,1] or [1,2],[1,2],..[1,10],[2,1]...?
Suppose the matrix with 10M element is of size 100M, and the L2 cache is 4M, then L2 cache can't store all these 10M element. If we process the data sequentially, we will have less L2 cache missing ratio. For our case, we need to process row by row and read several columns at the same time, such as column A, B, C, and then create some result. If the layout of the memory is first store 10 items in 1st row, then store 10 items in the 2nd row, then the performance might be better.
If there any way to control the memory layout?
a
vs.t(a
) to see if rows/column have much of an effect. - Richie Cotton