6
votes

from gene expression data (40000 genes (variables) x 30 observation) I want to create a 40000 x 40000 covariance matrix. This definitely is larger than my RAM. With package 'ff' I managed to preallocate a 40000x40000 empty matrix for the correlations. However the 'cov' or 'cor' function will manage only a 5000x5000 covariance matrix on my system, so I have to do blockwise 1:5000, 5001:10000 etc covariance calculations and fill the preallocated matrix along the diagonal. Does anybody know of an algorithm to fill the "missing patches" in the matrix, i.e. covariance (or correlation between) 1 and 22000. I know I can do all pairwise combinations and fill in the matrix one-by-one, but 'cor' is quite fast... So, is there a way to calculate cov (or cor) of 1/22000 by using the already calculated covariances?

Thanks in advance!

1

1 Answers

1
votes

You can use cov with 2 arguments to compute the off-diagonal blocks.

cov( x[,1:5000], x[,5001:10000] )