3
votes

I am trying to create a symmetrical matrix from a lower triangular matrix in R.

In a previous Q & A (Convert upper triangular part of a matrix to symmetric matrix) user 李哲源 said that for large matrices this shouldn't be done in R and proposed a solution in C. But I don't understand C and have never used for example Rccp before so don't know how to interpret the answer. Yet it is obvious that the C code there generates random numbers (rnorm) which I don't want. I want to put in a square matrix and get out a symmetric matrix.

For a given square matrix A that has data in its lower triangle, how would I create a symmetric matrix efficiently in C and use it in R?

2

2 Answers

5
votes

Quickly adapted from the dist2mat function in as.matrix on a distance object is extremely slow; how to make it faster?.

library(Rcpp)

cppFunction('NumericMatrix Mat2Sym(NumericMatrix A, bool up2lo, int bf) {

  IntegerVector dim = A.attr("dim");
  size_t n = (size_t)dim[0], m = (size_t)dim[1];
  if (n != m) stop("A is not a square matrix!");

  /* use pointers */
  size_t j, i, jj, ni, nj;
  double *A_jj, *A_ij, *A_ji, *col, *row, *end;

  /* cache blocking factor */
  size_t b = (size_t)bf;

  /* copy lower triangular to upper triangular; cache blocking applied */
  for (j = 0; j < n; j += b) {
    nj = n - j; if (nj > b) nj = b;
    /* diagonal block has size nj x nj */
    A_jj = &A(j, j);
    for (jj = nj - 1; jj > 0; jj--, A_jj += n + 1) {
      /* copy a column segment to a row segment (or vise versa) */
      col = A_jj + 1; row = A_jj + n;
      for (end = col + jj; col < end; col++, row += n) {
        if (up2lo) *col = *row; else *row = *col;
        }
      }
    /* off-diagonal blocks */
    for (i = j + nj; i < n; i += b) {
      ni = n - i; if (ni > b) ni = b;
      /* off-diagonal block has size ni x nj */
      A_ij = &A(i, j); A_ji = &A(j, i);
      for (jj = 0; jj < nj; jj++) {
        /* copy a column segment to a row segment (or vise versa) */
        col = A_ij + jj * n; row = A_ji + jj;
        for (end = col + ni; col < end; col++, row += n) {
          if (up2lo) *col = *row; else *row = *col;
          }
        }
      }
    }

  return A;
  }')

For a square matrix A, this function Mat2Sym copies its lower triangular part (with transposition) into its upper triangular part to make it symmetric if up2lo = FALSE, and vise versa if up2lo = TRUE.

Note that the function overwrites A without using extra memory. To retain the input matrix and create a new output matrix, pass A + 0 not A into the function.

## an arbitrary asymmetric square matrix
set.seed(0)
A <- matrix(runif(25), 5)
#          [,1]      [,2]       [,3]      [,4]      [,5]
#[1,] 0.8966972 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.2655087 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.3721239 0.9446753 0.17655675 0.7176185 0.2121425
#[4,] 0.5728534 0.6607978 0.68702285 0.9919061 0.6516738
#[5,] 0.9082078 0.6291140 0.38410372 0.3800352 0.1255551

## lower triangular to upper triangular; don't overwrite
B <- Mat2Sym(A + 0, up2lo = FALSE, 128)
#          [,1]      [,2]      [,3]      [,4]      [,5]
#[1,] 0.8966972 0.2655087 0.3721239 0.5728534 0.9082078
#[2,] 0.2655087 0.8983897 0.9446753 0.6607978 0.6291140
#[3,] 0.3721239 0.9446753 0.1765568 0.6870228 0.3841037
#[4,] 0.5728534 0.6607978 0.6870228 0.9919061 0.3800352
#[5,] 0.9082078 0.6291140 0.3841037 0.3800352 0.1255551

## A is unchanged
A
#          [,1]      [,2]       [,3]      [,4]      [,5]
#[1,] 0.8966972 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.2655087 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.3721239 0.9446753 0.17655675 0.7176185 0.2121425
#[4,] 0.5728534 0.6607978 0.68702285 0.9919061 0.6516738
#[5,] 0.9082078 0.6291140 0.38410372 0.3800352 0.1255551

## upper triangular to lower triangular; overwrite
D <- Mat2Sym(A, up2lo = TRUE, 128)
#           [,1]      [,2]       [,3]      [,4]      [,5]
#[1,] 0.89669720 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.20168193 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.06178627 0.2059746 0.17655675 0.7176185 0.2121425
#[4,] 0.76984142 0.4976992 0.71761851 0.9919061 0.6516738
#[5,] 0.77744522 0.9347052 0.21214252 0.6516738 0.1255551

## A has been changed; D and A are aliased in memory
A
#           [,1]      [,2]       [,3]      [,4]      [,5]
#[1,] 0.89669720 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.20168193 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.06178627 0.2059746 0.17655675 0.7176185 0.2121425
#[4,] 0.76984142 0.4976992 0.71761851 0.9919061 0.6516738
#[5,] 0.77744522 0.9347052 0.21214252 0.6516738 0.1255551

On the use of Matrix package

Matrix is particularly useful for sparse matrices. For compatibility it also defines some classes like "dgeMatrix", "dtrMatrix", "dtpMatrix", "dsyMatrix", "dspMatrix" for dense matrices.

Given a square matrix A, the Matrix way to make it symmetric is the following.

set.seed(0)
A <- matrix(runif(25), 5)
#          [,1]      [,2]       [,3]      [,4]      [,5]
#[1,] 0.8966972 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.2655087 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.3721239 0.9446753 0.17655675 0.7176185 0.2121425
#[4,] 0.5728534 0.6607978 0.68702285 0.9919061 0.6516738
#[5,] 0.9082078 0.6291140 0.38410372 0.3800352 0.1255551

## equivalent to: Mat2Sym(A + 0, TRUE, 128)
new("dsyMatrix", x = base::c(A), Dim = dim(A), uplo = "U")
#5 x 5 Matrix of class "dsyMatrix"
#           [,1]      [,2]       [,3]      [,4]      [,5]
#[1,] 0.89669720 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.20168193 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.06178627 0.2059746 0.17655675 0.7176185 0.2121425
#[4,] 0.76984142 0.4976992 0.71761851 0.9919061 0.6516738
#[5,] 0.77744522 0.9347052 0.21214252 0.6516738 0.1255551

## equivalent to: Mat2Sym(A + 0, FALSE, 128)
new("dsyMatrix", x = base::c(A), Dim = dim(A), uplo = "L")
#5 x 5 Matrix of class "dsyMatrix"
#          [,1]      [,2]      [,3]      [,4]      [,5]
#[1,] 0.8966972 0.2655087 0.3721239 0.5728534 0.9082078
#[2,] 0.2655087 0.8983897 0.9446753 0.6607978 0.6291140
#[3,] 0.3721239 0.9446753 0.1765568 0.6870228 0.3841037
#[4,] 0.5728534 0.6607978 0.6870228 0.9919061 0.3800352
#[5,] 0.9082078 0.6291140 0.3841037 0.3800352 0.1255551

Matrix method is sub-optimal for three reasons:

  1. It requires passing x slot as a numeric vector, so we have to do base::c(A) which essentially creates a copy of the matrix in RAM;
  2. It can not do in-place modification, so a new matrix copy is created as the output matrix;
  3. It does not do cache blocking when doing transposed copying.

Here is a quick comparison:

library(bench)
A <- matrix(runif(5000 * 5000), 5000)
bench::mark("Mat2Sym" = Mat2Sym(A, FALSE, 128),
            "Matrix" = new("dsyMatrix", x = base::c(A), Dim = dim(A), uplo = "L"),
            check = FALSE)
#  expression      min     mean   median     max `itr/sec` mem_alloc  n_gc n_itr
#  <chr>      <bch:tm> <bch:tm> <bch:tm> <bch:t>     <dbl> <bch:byt> <dbl> <int>
#1 Mat2Sym      56.8ms   57.7ms   57.4ms  59.4ms     17.3     2.48KB     0     9
#2 Matrix      334.3ms  337.4ms  337.4ms 340.6ms      2.96  190.74MB     2     2

Note how fast Mat2Sym is. Also no memory allocation is made in "overwriting" mode.

As G. Grothendieck mentions, we can also use "dspMatrix".

set.seed(0)
A <- matrix(runif(25), 5)
#          [,1]      [,2]       [,3]      [,4]      [,5]
#[1,] 0.8966972 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.2655087 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.3721239 0.9446753 0.17655675 0.7176185 0.2121425
#[4,] 0.5728534 0.6607978 0.68702285 0.9919061 0.6516738
#[5,] 0.9082078 0.6291140 0.38410372 0.3800352 0.1255551

## equivalent to: Mat2Sym(A + 0, TRUE, 128)
new("dspMatrix", x = A[upper.tri(A, TRUE)], Dim = dim(A), uplo = "U")
#5 x 5 Matrix of class "dspMatrix"
#           [,1]      [,2]       [,3]      [,4]      [,5]
#[1,] 0.89669720 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.20168193 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.06178627 0.2059746 0.17655675 0.7176185 0.2121425
#[4,] 0.76984142 0.4976992 0.71761851 0.9919061 0.6516738
#[5,] 0.77744522 0.9347052 0.21214252 0.6516738 0.1255551

## equivalent to: Mat2Sym(A + 0, FALSE, 128)
new("dspMatrix", x = A[lower.tri(A, TRUE)], Dim = dim(A), uplo = "L")
#5 x 5 Matrix of class "dspMatrix"
#          [,1]      [,2]      [,3]      [,4]      [,5]
#[1,] 0.8966972 0.2655087 0.3721239 0.5728534 0.9082078
#[2,] 0.2655087 0.8983897 0.9446753 0.6607978 0.6291140
#[3,] 0.3721239 0.9446753 0.1765568 0.6870228 0.3841037
#[4,] 0.5728534 0.6607978 0.6870228 0.9919061 0.3800352
#[5,] 0.9082078 0.6291140 0.3841037 0.3800352 0.1255551

Again, Matrix is method is sub-optimal, due to the use of upper.tri or lower.tri.

library(bench)
A <- matrix(runif(5000 * 5000), 5000)
bench::mark("Mat2Sym" = Mat2Sym(A, FALSE, 128),
            "Matrix" = new("dspMatrix", x = A[lower.tri(A, TRUE)], Dim = dim(A),
                           uplo = "L"),
            check = FALSE)
#  expression      min     mean   median     max `itr/sec` mem_alloc  n_gc n_itr
#  <chr>      <bch:tm> <bch:tm> <bch:tm> <bch:t>     <dbl> <bch:byt> <dbl> <int>
#1 Mat2Sym      56.5ms   57.9ms     58ms  58.7ms     17.3     2.48KB     0     9
#2 Matrix      934.7ms  934.7ms    935ms 934.7ms      1.07  524.55MB     2     1

In particular, we see that using "dspMatrix" is even less efficient that using "dsyMatrix".

2
votes

Before making possibly premature optimizations with C/C++ check whether a dense matrix

A + t(A)

is sufficient (assuming A's only nonzero elements are below the diagonal or above it.

Also, if memory is the problem then the Matrix package has the packed symmetric class dspMatrix which can be created like this:

library(Matrix)

A <- matrix(c(0, 2, 3, 0, 0, 4, 0, 0, 0), 3) # dense lower triangular test input
dspA <- as(A + t(A), "dspMatrix")

giving:

> str(dspA)
Formal class 'dspMatrix' [package "Matrix"] with 5 slots
  ..@ x       : num [1:6] 0 2 0 3 4 0   <- only 6 elements stored, not 9
  ..@ Dim     : int [1:2] 3 3
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ uplo    : chr "U"
  ..@ factors : list()

or one can create it directly from the upper triangular part:

# use upper triangular part since we already created dspA that way
tA <- t(A)
dspA2 <- new("dspMatrix", Dim = as.integer(c(3,3)), 
  x = tA[upper.tri(tA, diag = TRUE)])

identical(dspA, dspA2)
## [1] TRUE