I have some experience in C++ and R but am a newbie to Rcpp. Recently I had a huge success by using Rcpp in some of my previous projects, so decided to apply it to a new project. I was surprised that my Rcpp code could be much slower than the corresponding R function. I have tried to simplify my R function to figure out the reason but cannot find any clue. Your help and comments are very welcome!
The main R function to compare R and Rcpp implementations:
main <- function(){
n <- 50000
Delta <- exp(rnorm(n))
delta <- exp(matrix(rnorm(n * 5), nrow = n))
rx <- matrix(rnorm(n * 20), nrow = n)
print(microbenchmark(c1 <- test(Delta, delta, rx), times = 500))
print(microbenchmark(c2 <- rcpp_test(Delta, delta, rx), times = 500))
identical(c1, c2)
list(c1 = c1, c2 = c2)
}
R implementation:
test <- function(Delta, delta, rx){
const <- list()
for(i in 1:ncol(delta)){
const[[i]] <- rx * (Delta / (1 + delta[, i]))
}
const
}
Rcpp implementation:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List rcpp_test(NumericVector Delta,
NumericMatrix delta,
NumericMatrix rx) {
int n = Delta.length();
int m = rx.ncol();
List c;
NumericMatrix c1;
for(int i = 0; i < delta.ncol(); ++i){
c1 = NumericMatrix(n, m);
for(int k = 0; k < n; ++k){
double tmp = Delta[k] / (1 + delta(k, i));
for(int j = 0; j < c1.ncol(); ++j){
c1(k, j) = rx(k, j) * tmp;
}
}
c.push_back(c1);
}
return c;
}
I understand that there is no guarantee of improved efficiency by using Rcpp, but given the simple example I show here, I do not see why the Rcpp code runs so slowly.
Unit: milliseconds
expr min lq mean median uq max neval
c1 <- test(Delta, delta, rx) 13.16935 14.19951 44.08641 30.43126 73.78581 115.9645 500
Unit: milliseconds
expr min lq mean median uq max neval
c2 <- rcpp_test(Delta, delta, rx) 143.1917 158.7481 171.6116 163.413 173.7677 247.5495 500
Ideally rx
is a list of matrices in my project. The variable i
in the for loop will be used to pick an element for computing. At the beginning I suspected that passing a List
to Rcpp could have a high overhead, so in this example, I assumed rx
to be a fixed matrix being used for all i
. Seems like that is not the reason for the slowness.
push_back()
can cost quite a lot in terms of performance and should be avoided in applications that require fast execution speeds. It is better to allocate the required memory beforehand. – RHertelc
as what Ralf Stubner did does not help much. Please refer to my reply to Ralf Stubner's answer. – Han Zhangpush_back()
is the main reason for the low speed of your code. What I said is thatpush_back()
can cost quite a lot of performance and that it is better to allocate the memory at the beginning. Moreover, I think that it useful to point out that it is bad programming style to dynamically grow an object within a loop. – RHertel