I think there are two issues here.
now I want to work with very large matrices
Then do not use R code to do this job. R will use much more memory than you expect. Try the following code:
res.upper <- rnorm(4950)
res <- matrix(0, 100, 100)
tracemem(res) ## trace memory copies of `res`
res[upper.tri(res)] <- res.upper
rm(res.upper)
diag(res) <- 1
res[lower.tri(res)] <- t(res)[lower.tri(res)]
This is what you will get:
> res.upper <- rnorm(4950) ## allocation of length 4950 vector
> res <- matrix(0, 100, 100) ## allocation of 100 * 100 matrix
> tracemem(res)
[1] "<0xc9e6c10>"
> res[upper.tri(res)] <- res.upper
tracemem[0xc9e6c10 -> 0xdb7bcf8]: ## allocation of 100 * 100 matrix
> rm(res.upper)
> diag(res) <- 1
tracemem[0xdb7bcf8 -> 0xdace438]: diag<- ## allocation of 100 * 100 matrix
> res[lower.tri(res)] <- t(res)[lower.tri(res)]
tracemem[0xdace438 -> 0xdb261d0]: ## allocation of 100 * 100 matrix
tracemem[0xdb261d0 -> 0xccc34d0]: ## allocation of 100 * 100 matrix
In R, you have to use 5 * (100 * 100) + 4950
double words to finish these operations. While in C, you only need at most 4950 + 100 * 100
double words (In fact, 100 * 100
is all that is needed! Will talk about it later). It is difficult to overwrite object directly in R without extra memory assignment.
Is there any way I can directly convert res.upper
to a symmetric matrix without having to initialize the matrix res
first?
You do have to allocate memory for res
because that is what you end up with; but there is no need to allocate memory for res.upper
. You can initialize the upper triangular, while filling in the lower triangular at the same time. Consider the following template:
#include <Rmath.h> // use: double rnorm(double a, double b)
#include <R.h> // use: getRNGstate() and putRNGstate() for randomness
#include <Rinternals.h> // SEXP data type
## N is matrix dimension, a length-1 integer vector in R
## this function returns the matrix you want
SEXP foo(SEXP N) {
int i, j, n = asInteger(N);
SEXP R_res = PROTECT(allocVector(REALSXP, n * n)); // allocate memory for `R_res`
double *res = REAL(R_res);
double tmp; // a local variable for register reuse
getRNGstate();
for (i = 0; i < n; i++) {
res[i * n + i] = 1.0; // diagonal is 1, as you want
for (j = i + 1; j < n; j++) {
tmp = rnorm(0, 1);
res[j * n + i] = tmp; // initialize upper triangular
res[i * n + j] = tmp; // fill lower triangular
}
}
putRNGstate();
UNPROTECT(1);
return R_res;
}
The code has not been optimized, as using integer multiplication j * n + i
for addressing in the innermost loop will result in performance penalty. But I believe you can move multiplication outside the inner loop, and only leave addition inside.
big.matrix
from thebigmemory
package? Could be a way around your memory limitations – konvassparseMatrix(i = sequence(1:99), j = rep(2:100, 1:99), x = res.upper, symmetric = TRUE, dims = c(100, 100))
to avoid multiple copying and using (probably) smaller objects – alexis_laz