18
votes

I have two matrices in R that I want to multiply:

a = matrix(rnorm(20*10000, mean=0, sd=5), 20, 10000)
b = matrix(rnorm(20*10000, mean=0, sd=5), 20, 10000)
t(a)%*%b

Given that the dimension in larger this matrix multiplication takes a lot of time, is there a specific way to make computations faster ? And are there any build in functions in R to make such multiplications faster?

3
You can try using RevolutionR which is an enhanced distribution of R. Can be found here: mran.revolutionanalytics.com/openRaad

3 Answers

31
votes

There are many ways to approach this depending upon your code, effort, and hardware.

  1. Use the 'best' function for the job

The simplest is to use crossprod which is the same as t(a)%*% b (Note - this will only be a small increase in speed)

crossprod(a,b) 
  1. Use Rcpp (and likely RcppEigen/RcppArmadillo).

C++ will likely greater increase the speed of your code. Using linear algebra libraries will likely also help this further (hence Eigen and Armadillo). This however assumes you are willing to write some C++.

  1. Use a better backend

After this point you are looking at BLAS backends such as OpenBLAS, Atlas, etc. Hooking these up to R varies depending upon your OS. It is quite easy if you are using a Debian system like Ubuntu. You can find a demo here. These can sometimes be leveraged further by libraries such as Armadillo and Eigen.

  1. GPU Computing

If you have a GPU (e.g. AMD, NVIDIA, etc.) you can leverage the many cores within to greatly speed up your computations. There are a few that could be useful including gpuR, gputools, and gmatrix

EDIT - to address @jenesaiquoi comment on benefit of Rcpp

test.cpp

// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]

#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP armaMatMult(arma::mat A, arma::mat B){
    arma::mat C = A * B;

    return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMatMult(Eigen::MatrixXd A, Eigen::MatrixXd B){
    Eigen::MatrixXd C = A * B;

    return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMapMatMult(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
    Eigen::MatrixXd C = A * B;

    return Rcpp::wrap(C);
}

test.R

library(Rcpp)

A <- matrix(rnorm(10000), 100, 100)
B <- matrix(rnorm(10000), 100, 100)

library(microbenchmark)
sourceCpp("test.cpp")
microbenchmark(A%*%B, armaMatMult(A, B), eigenMatMult(A, B), eigenMapMatMult(A, B))

Unit: microseconds
                  expr     min       lq     mean   median       uq      max neval
               A %*% B 885.846 892.1035 933.7457 901.1010 938.9255 1411.647   100
     armaMatMult(A, B) 846.688 857.6320 915.0717 866.2265 893.7790 1421.557   100
    eigenMatMult(A, B) 205.978 208.1295 233.1882 217.0310 229.4730  369.369   100
 eigenMapMatMult(A, B) 192.366 194.9835 207.1035 197.5405 205.2550  366.945   100
4
votes

To add to cdeterman's answer: You can use eigen's build in parallelization for dense matrix products. In order to do so, you need to compile with open mp activated.

// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]
// [[Rcpp::plugins(openmp)]]

#include <omp.h>
#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP armaMatMult(arma::mat A, arma::mat B){
  arma::mat C = A * B;
  
  return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMatMult(Eigen::MatrixXd A, 
                  Eigen::MatrixXd B, 
                  int n_cores){
  
  Eigen::setNbThreads(n_cores);
  //qDebug()  << Eigen::nbThreads( );
  Eigen::MatrixXd C = A * B;
  
  return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMapMatMult2(const Eigen::Map<Eigen::MatrixXd> A,
                      Eigen::Map<Eigen::MatrixXd> B, 
                      int n_cores){
  
  Eigen::setNbThreads(n_cores);
  Eigen::MatrixXd C = A * B;
  return Rcpp::wrap(C);
}

Here are some benchmarks:

Note that if N = k = 100, parallelization does not necessarily improve performance. If the matrix dimensions get larger, parallelization starts to have an impact (N = k = 1000):

library(microbenchmark)

# Benchmark 1: N = k = 100

N <- 100
k <- 100

A <- matrix(rnorm(N*k), N, k)
B <- matrix(rnorm(N*k), k, N)

microbenchmark(A%*%B, 
               armaMatMult2(A, B),
               eigenMatMult2(A, B, n_cores = 1),
               eigenMatMult2(A, B, n_cores = 2),
               eigenMatMult2(A, B, n_cores = 4),
               eigenMapMatMult2(A, B, n_cores = 1),
               eigenMapMatMult2(A, B, n_cores = 2),
               eigenMapMatMult2(A, B, n_cores = 4), 
               times = 100

# Unit: microseconds
#                                 expr   min     lq    mean median     uq   max neval
#                              A %*% B 535.6 540.75 552.594 551.25 554.50 650.2   100
#                   armaMatMult2(A, B) 542.0 549.10 560.975 556.35 560.25 738.1   100
#     eigenMatMult2(A, B, n_cores = 1) 147.1 152.65 159.165 159.65 162.90 180.5   100
#     eigenMatMult2(A, B, n_cores = 2)  97.1 109.90 124.496 119.60 127.50 391.8   100
#     eigenMatMult2(A, B, n_cores = 4)  71.7  88.15 155.220 115.55 216.95 507.3   100
#  eigenMapMatMult2(A, B, n_cores = 1) 139.1 150.10 154.889 154.20 158.35 244.3   100
#  eigenMapMatMult2(A, B, n_cores = 2)  93.4 105.70 116.808 113.55 120.40 323.7   100
#  eigenMapMatMult2(A, B, n_cores = 4)  66.8  82.60 161.516 196.25 210.40 598.9   100
)

# Benchmark 2: N = k = 1000

N <- 1000
k <- 1000

A <- matrix(rnorm(N*k), N, k)
B <- matrix(rnorm(N*k), k, N)

microbenchmark(A%*%B, 
                armaMatMult2(A, B),
                eigenMatMult2(A, B, n_cores = 1),
                eigenMatMult2(A, B, n_cores = 2),
                eigenMatMult2(A, B, n_cores = 4),
               eigenMapMatMult2(A, B, n_cores = 1),
               eigenMapMatMult2(A, B, n_cores = 2),
               eigenMapMatMult2(A, B, n_cores = 4), 
               times = 100
)


Unit: milliseconds
                                expr      min        lq      mean    median        uq
                             A %*% B 597.1293 605.56840 814.52389 665.86650 1025.5896
                  armaMatMult2(A, B) 603.3894 620.25675 830.98947 693.22355 1078.4853
    eigenMatMult2(A, B, n_cores = 1) 131.4696 135.22475 186.69826 193.37870  219.8727
    eigenMatMult2(A, B, n_cores = 2)  67.8948  71.71355 114.52759  74.17380  173.3060
    eigenMatMult2(A, B, n_cores = 4)  41.8564  48.87075  79.55535  72.00705  106.8572
 eigenMapMatMult2(A, B, n_cores = 1) 125.3890 129.26125 175.09933 177.23655  213.0536
 eigenMapMatMult2(A, B, n_cores = 2)  62.2866  65.78785 115.74248  79.92470  167.0217
 eigenMapMatMult2(A, B, n_cores = 4)  35.2977  40.42480  68.21669  63.13655   97.2571
       max neval
 1217.6475   100
 1446.5127   100
  419.2043   100
  217.9513   100
  139.9629   100
  298.2859   100
  230.6307   100
  118.2553   100
-1
votes

why crossprod() is the fastest way in my server?

Unit: milliseconds
                            expr       min        lq     mean   median       uq       max neval
                            A %*% B  9.630260  9.943458 11.54563 10.10754 11.04614 118.87243   100
                    crossprod(A, B)  9.668597  9.846900 10.54206 10.07100 11.05562  18.93269   100
                  armaMatMult(A, B) 12.460832 13.250503 19.07578 20.17903 23.33987  30.40554   100
    eigenMatMult(A, B, n_cores = 1) 57.248345 58.737071 60.26937 60.08885 61.47355  71.58064   100
    eigenMatMult(A, B, n_cores = 2) 31.465149 32.908495 34.27428 34.33102 35.28012  38.14678   100
    eigenMatMult(A, B, n_cores = 4) 18.724495 19.509954 21.49996 20.44093 22.34533  38.44640   100
eigenMapMatMult2(A, B, n_cores = 1) 54.040737 55.628182 57.16926 57.13763 58.41168  67.12156   100
eigenMapMatMult2(A, B, n_cores = 2) 28.965028 30.021924 31.26321 30.87678 32.45385  35.07308   100
eigenMapMatMult2(A, B, n_cores = 4) 16.208926 16.830164 18.32303 17.30694 19.64176  33.19359   100

I am using Microsoft R 4.0. BLAS or LAPACK?

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /opt/microsoft/ropen/4.0.2/lib64/R/lib/libRblas.so    
LAPACK: /opt/microsoft/ropen/4.0.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C               LC_TIME=zh_CN.UTF-8            LC_COLLATE=zh_CN.UTF-8    
 [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=zh_CN.UTF-8    LC_PAPER=zh_CN.UTF-8           LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-7    miraculix_1.0.5         RandomFieldsUtils_1.0.6 RevoUtils_11.0.2        RevoUtilsMath_11.0.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7               lattice_0.20-44          digest_0.6.27            grid_4.0.2              
 [5] evaluate_0.14            rlang_0.4.11             RcppArmadillo_0.10.6.0.0 Matrix_1.3-4            
 [9] rmarkdown_2.9            tools_4.0.2              RcppEigen_0.3.3.7.0      tinytex_0.32            
[13] nadiv_2.17.1             parallel_4.0.2           xfun_0.24                yaml_2.2.1              
[17] compiler_4.0.2           htmltools_0.5.1.1        knitr_1.33