I was reading that R
uses column-major storage in matrices, which means that elements in nearby columns are stored in contiguous blocks or something of that sort. This made me wonder:
Is it faster to fill a matrix by row (using byrow=TRUE
in the base R function matrix()
) or is it faster to fill the matrix by column first (using the default byrow=FALSE
) and then transpose it using t()
?
I tried benchmarking it.
Filling a Matrix by Row
> microbenchmark(matrix(1, n, n, byrow=TRUE))
Unit: seconds
expr min lq mean median uq max neval
matrix(1, n, n, byrow = TRUE) 1.047379 1.071353 1.105468 1.081795 1.112995 1.628675 100
Filling a Matrix by Column and then Transpose it
> microbenchmark(t(matrix(1, n, n)))
Unit: seconds
expr min lq mean median uq max neval
t(matrix(1, n, n)) 1.43931 1.536333 1.692572 1.61793 1.726244 3.070821 100
Conclusion
It seems that it's faster to fill the matrix by row! Am I missing something? I would have thought that R
would just do some relabelling with t()
but it's actually slower than filling in the matrix by row!
Is there an explanation for this? I'm quite baffled.
Observation
After ThomasIsCoding's answer and after benchmarking myself a few times it looks like it depends on the number of rows and number of columns.
- Number of Rows < Number of Columns:
t()
is faster. - Number of Rows = Number of Columns:
byrow=TRUE
is faster. - Number of Rows > Number of Columns:
byrow=TRUE
is faster.