0
votes

I'm using a custom function within dplyr's mutate and am getting the unexpected result of a matrix rather than a vector. I want to take one or more columns and transform them according to a custom function.

#Dummy data:
require(tidyverse)
set.seed(101)
dummy_data <- tibble(mfc = 1:15,value1 = runif(15),value2 = runif(15))

Desired output:

> dummy_data %>% mutate_at(vars(value1,value2),funs(trans = mutate_transform(.)))

    # A tibble: 15 x 5
     mfc     value1     value2 trans_value1 trans_value2
   <int>      <dbl>      <dbl>        <dbl>        <dbl>
 1     1 0.37219838 0.59031973  -0.68738175    0.2211214
 2     2 0.04382482 0.82043609  -2.01936309    0.9903636
 3     3 0.70968402 0.22411848   0.68156092   -1.0030308
 4     4 0.65769040 0.41166683   0.47065924   -0.3760867
 5     5 0.24985572 0.03861056  -1.18364013   -1.6231541
 6     6 0.30005483 0.70071155  -0.98001753    0.5901436
 7     7 0.58486663 0.95683746   0.17526426    1.4463316
 8     8 0.33346714 0.21335200  -0.84448721   -1.0390214
 9     9 0.62201196 0.66106150   0.32593685    0.4575998
10    10 0.54582855 0.92331888   0.01691417    1.3342843
11    11 0.87979573 0.79571976   1.37158488    0.9077409
12    12 0.70687474 0.07121255   0.67016565   -1.5141709
13    13 0.73197259 0.38940777   0.77197005   -0.4504952
14    14 0.93163443 0.40645122   1.58185814   -0.3935216
15    15 0.45512059 0.65935508  -0.35102444    0.4518955

What I get instead:

   # A tibble: 15 x 5
     mfc     value1     value2           value1_trans           value2_trans
   <int>      <dbl>      <dbl>           <data.frame>           <data.frame>
 1     1 0.37219838 0.59031973 <data.frame [15 x 15]> <data.frame [15 x 15]>
 2     2 0.04382482 0.82043609 <data.frame [15 x 15]> <data.frame [15 x 15]>
 3     3 0.70968402 0.22411848 <data.frame [15 x 15]> <data.frame [15 x 15]>
 4     4 0.65769040 0.41166683 <data.frame [15 x 15]> <data.frame [15 x 15]>
 5     5 0.24985572 0.03861056 <data.frame [15 x 15]> <data.frame [15 x 15]>
 6     6 0.30005483 0.70071155 <data.frame [15 x 15]> <data.frame [15 x 15]>
 7     7 0.58486663 0.95683746 <data.frame [15 x 15]> <data.frame [15 x 15]>
 8     8 0.33346714 0.21335200 <data.frame [15 x 15]> <data.frame [15 x 15]>
 9     9 0.62201196 0.66106150 <data.frame [15 x 15]> <data.frame [15 x 15]>
10    10 0.54582855 0.92331888 <data.frame [15 x 15]> <data.frame [15 x 15]>
11    11 0.87979573 0.79571976 <data.frame [15 x 15]> <data.frame [15 x 15]>
12    12 0.70687474 0.07121255 <data.frame [15 x 15]> <data.frame [15 x 15]>
13    13 0.73197259 0.38940777 <data.frame [15 x 15]> <data.frame [15 x 15]>
14    14 0.93163443 0.40645122 <data.frame [15 x 15]> <data.frame [15 x 15]>
15    15 0.45512059 0.65935508 <data.frame [15 x 15]> <data.frame [15 x 15]>

Here's my custom function:

  mutate_transform <- function(x){
  require(caret)
  trans <-  preProcess(data.frame(x), c("BoxCox", "center", "scale"))
  data_trans <-  data.frame(trans = predict(trans, data.frame(x)))
  return(data_trans)
}

Am I using mutate wrong or should I change my custom function mutate_transform?

2

2 Answers

1
votes

Your custom function should just return a simple vector, not a data.frame. For example

mutate_transform <- function(x){
  require(caret)
  trans <-  preProcess(data.frame(x), c("BoxCox", "center", "scale"))
  predict(trans, data.frame(x))$x
}
0
votes

Why not just call lapply function on your columns:

dummy_data[, c("trans_value1", " trans_value2")] <- lapply(dummy_data[,2:3], mutate_transform)