7
votes

I have the upper triangular part of matrix in R (without diagonal) and want to generate a symmetric matrix from the upper triangular part (with 1 on the diagonal but that can be adjusted later). I usually do that like this:

res.upper <- rnorm(4950)
res <- matrix(0, 100, 100)
res[upper.tri(res)] <- res.upper
rm(res.upper)
diag(res) <- 1
res[lower.tri(res)]  <- t(res)[lower.tri(res)]

This works fine but now I want to work with very large matrices. Thus, I would want to avoid having to store res.upper and res (filled with 0) at the same time. Is there any way I can directly convert res.upper to a symmetric matrix without having to initialize the matrix res first?

1
I could write the compiled code, I also do that sometimes to speed up my functions. However, I do not really understand how that avoids using extra memory. In the C/C++ code, I would also first initialize an object like res above. Wouldn't that also use extra memory? Or is that not a problem when using C/C++ because memory allocation is more "intelligent" in these languages? That might be a stupid question but I am a statistician and not a computer scientist, so I really don't know how memory allocation works internally.Lila
You don't have to write the function for me, that's not a problem. I am familiar with compilation and the inline package. I just don't have enough background to understand how that solves my memory problem. But if you assure me that it does I'll write the function (and I'll accept your answer if you give it as an answer instead of a comment).Lila
Have you tried using big.matrix from the bigmemory package? Could be a way around your memory limitationskonvas
The final matrix anyway has to be converted to class kernelMatrix from the library kernlab. I don't know whether this will work with big.matrix but I will definitely have a look at it! Thank you for the hint! I did not know that package, this is the first time I have to work with such big matrices.Lila
You might, also, find helpful "Matrix" package -- i.e. sparseMatrix(i = sequence(1:99), j = rep(2:100, 1:99), x = res.upper, symmetric = TRUE, dims = c(100, 100)) to avoid multiple copying and using (probably) smaller objectsalexis_laz

1 Answers

6
votes

I think there are two issues here.

now I want to work with very large matrices

Then do not use R code to do this job. R will use much more memory than you expect. Try the following code:

res.upper <- rnorm(4950)
res <- matrix(0, 100, 100)
tracemem(res)  ## trace memory copies of `res`
res[upper.tri(res)] <- res.upper
rm(res.upper)
diag(res) <- 1
res[lower.tri(res)]  <- t(res)[lower.tri(res)]

This is what you will get:

> res.upper <- rnorm(4950)  ## allocation of length 4950 vector
> res <- matrix(0, 100, 100)  ## allocation of 100 * 100 matrix
> tracemem(res)
[1] "<0xc9e6c10>"
> res[upper.tri(res)] <- res.upper
tracemem[0xc9e6c10 -> 0xdb7bcf8]: ## allocation of 100 * 100 matrix
> rm(res.upper)
> diag(res) <- 1
tracemem[0xdb7bcf8 -> 0xdace438]: diag<-  ## allocation of 100 * 100 matrix
> res[lower.tri(res)]  <- t(res)[lower.tri(res)]
tracemem[0xdace438 -> 0xdb261d0]: ## allocation of 100 * 100 matrix
tracemem[0xdb261d0 -> 0xccc34d0]: ## allocation of 100 * 100 matrix

In R, you have to use 5 * (100 * 100) + 4950 double words to finish these operations. While in C, you only need at most 4950 + 100 * 100 double words (In fact, 100 * 100 is all that is needed! Will talk about it later). It is difficult to overwrite object directly in R without extra memory assignment.

Is there any way I can directly convert res.upper to a symmetric matrix without having to initialize the matrix res first?

You do have to allocate memory for res because that is what you end up with; but there is no need to allocate memory for res.upper. You can initialize the upper triangular, while filling in the lower triangular at the same time. Consider the following template:

#include <Rmath.h>  // use: double rnorm(double a, double b)
#include <R.h>  // use: getRNGstate() and putRNGstate() for randomness
#include <Rinternals.h>  // SEXP data type

## N is matrix dimension, a length-1 integer vector in R
## this function returns the matrix you want
SEXP foo(SEXP N) {
  int i, j, n = asInteger(N);
  SEXP R_res = PROTECT(allocVector(REALSXP, n * n));  // allocate memory for `R_res`
  double *res = REAL(R_res);
  double tmp;  // a local variable for register reuse
  getRNGstate();
  for (i = 0; i < n; i++) {
    res[i * n + i] = 1.0;  // diagonal is 1, as you want
    for (j = i + 1; j < n; j++) {
      tmp = rnorm(0, 1);  
      res[j * n + i] = tmp; // initialize upper triangular
      res[i * n + j] = tmp;  // fill lower triangular
      }
    }
  putRNGstate();
  UNPROTECT(1);
  return R_res;
  }

The code has not been optimized, as using integer multiplication j * n + i for addressing in the innermost loop will result in performance penalty. But I believe you can move multiplication outside the inner loop, and only leave addition inside.