5
votes

I will be working in R with a quite large (7 e6 x 4.5 e3) but very sparse matrix. So I am trying to understand how to deal with sparse matrices efficiently. I have two related questions.

First: I have been given to understand that the Matrix package links automatically to the LAPACK and SuiteSparse compiled dlls. (I am working in Windows.) It was my impression that use of the SuiteSparse routines would shorten execution time compared with use of dense matrices using the LAPACK suite. But the test run below suggests that the run-time for the sparse version of the matrix is much slower than the dense version.

> library(Matrix)
> sparse <- sparseMatrix(1:4, 1:4, x=rnorm(4))
> dense <- as.matrix(sparse)
> x <- 1:4
> system.time(for (i in 1:10000) sparse %*% x)
   user  system elapsed 
   0.23    0.00    0.23 
> system.time(for (i in 1:10000) dense %*% x)
   user  system elapsed 
      0       0       0 
> system.time(for (i in 1:1000) solve(sparse))
   user  system elapsed 
   3.94    0.00    3.94 
> system.time(for (i in 1:1000) solve(dense))
   user  system elapsed 
   0.05    0.00    0.05

a) Am I correct that Matrix connects automatically with the above two compiled libraries? If not, how do I link to these DLLs? b) Is the use of sparse matrix algebra in general actually that much slower than use of dense matrix algebra?

Second: I have installed the RcppEigen and RcppArmadillo packages. I have been able to compile a test program with RcppArmadillo (using the paper by Dirk Eddelbuettel and Conrad Sanderson). But I have not, for the life of me, been able to find a similar introduction to RcppEigen that would give me some model code that I could use to get started. Can any of you point to a document similar to the Eddelbuettel and Sanderson paper that can help me get started with RcppEigen?

1
your second question is a request for off-site resources (someone may respond in comments, but technically it's off-topic for StackOverflow)Ben Bolker

1 Answers

9
votes

(A little too long for a comment.) I would start by profiling this for much larger matrices; I can imagine that the sparse algorithms are at a disadvantage when the matrix is small and not very sparse (e.g. in this case 25% of the cells are non-zero). In the example below (1000x1000 matrices), the sparse solver is 26 times faster than the dense solver. You may find that the Matrix routines are fast enough for your purposes, without taking on the additional cognitive overhead of learning (Rcpp)Eigen/(Rcpp)Armadillo ...

library(rbenchmark)
library(Matrix)
set.seed(101)
sparse <- sparseMatrix(1:1000,1:1000,x=rnorm(1000))
dense <- as.matrix(sparse)
benchmark(solve(sparse),solve(dense),replications=20,
          columns = c(
       "test", "replications", "elapsed", "relative", "user.self"))
##            test replications elapsed relative user.self
## 2  solve(dense)           20   6.932   26.868     6.692
## 1 solve(sparse)           20   0.258    1.000     0.256