26
votes

I'm trying to multiply a data frame df by a vector v, so that the product is a data frame, where the i-th row is given by df[i,]*v. I can do this, for example, by

df <- data.frame(A=1:5, B=2:6); v <- c(0,2)
as.data.frame(t(t(df) * v))
   A  B
1  0  4
2  0  6
3  0  8
4  0 10
5  0 12

I am sure there has to be a more R-style approach (and a very simple one!), but nothing comes on my mind. I even tried something like

apply(df, MARGIN=1, function(x) x*v)

but still, non-readable constructions like as.data.frame(t(.)) are required.
How can I find an efficient and elegant workaround here?

6
Why does it need to be a data.frame? If you have all numeric elements it generally makes more sense to use a matrix.Señor O

6 Answers

36
votes

This works too:

data.frame(mapply(`*`,df,v))

In that solution, you are taking advantage of the fact that data.frame is a type of list, so you can iterate over both the elements of df and v at the same time with mapply.

Unfortunately, you are limited in what you can output from mapply: as simple list, or a matrix. If your data are huge, this would likely be more efficient:

data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE))

Because it would convert it to a list, which is more efficient to convert to a data.frame.

12
votes

If you're looking for speed and memory efficiency - data.table to the rescue:

library(data.table)
dt = data.table(df)

for (i in seq_along(dt))
  dt[, (i) := dt[[i]] * v[i]]


eddi = function(dt) { for (i in seq_along(dt)) dt[, (i) := dt[[i]] * v[i]] }
arun = function(df) { df * matrix(v, ncol=ncol(df), nrow=nrow(df), byrow=TRUE) }
nograpes = function(df) { data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE)) }

N = 1e6
dt = data.table(A = rnorm(N), B = rnorm(N))
v = c(0,2)

microbenchmark(eddi(copy(dt)), arun(copy(dt)), nograpes(copy(dt)), times = 10)
#Unit: milliseconds
#               expr       min        lq      mean    median        uq       max neval
#     eddi(copy(dt))  23.01106  24.31192  26.47132  24.50675  28.87794  34.28403    10
#     arun(copy(dt)) 337.79885 363.72081 450.93933 433.21176 516.56839 644.70103    10
# nograpes(copy(dt))  19.44873  24.30791  36.53445  26.00760  38.09078  95.41124    10

As Arun points out in the comments, one can also use the set function from the data.table package to do this in-place modification on data.frame's as well:

for (i in seq_along(df))
  set(df, j = i, value = df[[i]] * v[i])

This of course also works for data.table's and could be significantly faster if the number of columns is large.

8
votes

A language that lets you combine vectors with matrices has to make a decision at some point whether the matrices are row-major or column-major ordered. The reason:

> df * v
  A  B
1 0  4
2 4  0
3 0  8
4 8  0
5 0 12

is because R operates down the columns first. Doing the double-transpose trick subverts this. Sorry if this is just explaining what you know, but I don't know another way of doing it, except explicitly expanding v into a matrix of the same size.

Or write a nice function that wraps the not very R-style code into something that is R-stylish.

4
votes

Whats wrong with

t(apply(df, 1, function(x)x*v))

?

3
votes
library(purrr)

map2_dfc(df, v, `*`)

Benchmark

N = 1e6
dt = data.table(A = rnorm(N), B = rnorm(N))
v = c(0,2)

eddi = function(dt) { for (i in seq_along(dt)) dt[, (i) := dt[[i]] * v[i]]; dt }
arun = function(df) { df * matrix(v, ncol=ncol(df), nrow=nrow(df), byrow=TRUE) }
nograpes = function(df) { data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE)) }
ryan = function(df) {map2_dfc(df, v, `*`) }
library(microbenchmark)
microbenchmark(
  eddi(copy(dt))
  , arun(copy(dt))
  , nograpes(copy(dt))
  , ryan(copy(dt))
  , times = 100)


# Unit: milliseconds
# expr                     min        lq      mean    median        uq      max neval
# eddi(copy(dt))      8.367513  11.06719  24.26205  12.29132  19.35958 171.6212   100
# arun(copy(dt))     94.031272 123.79999 186.42155 148.87042 251.56241 364.2193   100
# nograpes(copy(dt))  7.910739  10.92815  27.68485  13.06058  21.39931 172.0798   100
# ryan(copy(dt))      8.154395  11.02683  29.40024  13.73845  21.77236 181.0375   100
1
votes

I think the fastest way (without testing data.table) is data.frame(t(t(df)*v)).

My tests:

testit <- function(nrow, ncol)
{
    df <- as.data.frame(matrix(rnorm(nrow*ncol),nrow=nrow,ncol=ncol))

    v <- runif(ncol)

    r1 <- data.frame(t(t(df)*v))
    r2 <- data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE))
    r3 <- df * rep(v, each=nrow(df))

    stopifnot(identical(r1, r2) && identical(r1, r3))

    microbenchmark(data.frame(t(t(df)*v)), data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE)), df * rep(v, each=nrow(df)))
}

Result

> set.seed(1)
> 
> testit(100,100)
Unit: milliseconds
                                             expr       min        lq    median        uq      max neval
                         data.frame(t(t(df) * v))  2.297075  2.359541  2.455778  3.804836 33.05806   100
 data.frame(mapply(`*`, df, v, SIMPLIFY = FALSE))  9.977436 10.401576 10.658964 11.762009 15.09721   100
                     df * rep(v, each = nrow(df)) 14.309822 14.956705 16.092469 16.516609 45.13450   100
> testit(1000,10)
Unit: microseconds
                                             expr      min       lq   median       uq      max neval
                         data.frame(t(t(df) * v))  754.844  805.062  844.431 1850.363 27955.79   100
 data.frame(mapply(`*`, df, v, SIMPLIFY = FALSE)) 1457.895 1497.088 1567.604 2550.090  4732.03   100
                     df * rep(v, each = nrow(df)) 5383.288 5527.817 5875.143 6628.586 32392.81   100
> testit(10,1000)
Unit: milliseconds
                                             expr       min        lq    median        uq       max neval
                         data.frame(t(t(df) * v))  17.07548  18.29418  19.91498  20.67944  57.62913   100
 data.frame(mapply(`*`, df, v, SIMPLIFY = FALSE))  99.90103 104.36028 108.28147 114.82012 150.05907   100
                     df * rep(v, each = nrow(df)) 112.21719 118.74359 122.51308 128.82863 164.57431   100