I'm running a simulation where I need to repeatedly extract 1 column from a matrix and check each of its values against some condition (e.g. < 10). However, doing so with a matrix is 3 times slower than doing the same thing with a data.frame. Why is this the case?
I'd like to to use matrixes to store the simulation data because they are faster for some other operations (e.g. updating columns by adding/subtracting values). How can I extract columns / subset a matrix in a faster way?
Extract column from data.frame vs matrix:
df <- data.frame(a = 1:1e4)
m <- as.matrix(df)
library(microbenchmark)
microbenchmark(
df$a,
m[ , "a"])
# Results; Unit: microseconds
# expr min lq mean median uq max neval cld
# df$a 5.463 5.8315 8.03997 6.612 8.0275 57.637 100 a
# m[ , "a"] 64.699 66.6265 72.43631 73.759 75.5595 117.922 100 b
Extract single value from data.frame vs matrix:
microbenchmark(
df[1, 1],
df$a[1],
m[1, 1],
m[ , "a"][1])
# Results; Unit: nanoseconds
# expr min lq mean median uq max neval cld
# df[1, 1] 8248 8753.0 10198.56 9818.5 10689.5 48159 100 c
# df$a[1] 4072 4416.0 5247.67 5057.5 5754.5 17993 100 b
# m[1, 1] 517 708.5 828.04 810.0 920.5 2732 100 a
# m[ , "a"][1] 45745 47884.0 51861.90 49100.5 54831.5 105323 100 d
I expected the matrix column extraction to be faster, but it was slower. However, extracting a single value from a matrix (i.e. m[1, 1]
) was faster than both of the ways of doing so with a data.frame. I'm lost as to why this is.
Extract row vs column, data.frame vs matrix:
The above is only true for selecting columns. When selecting rows, matrices are much faster than data.frames. Still don't know why.
microbenchmark(
df[1, ],
m[1, ],
df[ , 1],
m[ , 1])
# Result: Unit: nanoseconds
# expr min lq mean median uq max neval cld
# df[1, ] 16359 17243.5 18766.93 17860.5 19849.5 42973 100 c
# m[1, ] 718 999.5 1175.95 1181.0 1327.0 3595 100 a
# df[ , 1] 7664 8687.5 9888.57 9301.0 10535.5 42312 100 b
# m[ , 1] 64874 66218.5 72074.93 73717.5 74084.5 97827 100 d
m[1,"a"]
is faster than subsetting a df. – Rui Barradas