1
votes

I want to write a function that adds a new variable to a data frame. That new variable constist in the concatenation of values corresponding to a set of variables passed in argument (as vector of strings). In base R, I would write something like:

addConcatFields<-function(data,listOfVar)
{
data$uniqueId=data[,listOfVar[1]]
for(elt in listOfVar[2:length(listOfVar)])
{
data$uniqueId=paste(data$uniqueId,data[,elt],sep='_')
}
return(data)
}

addConcatFields(iris,c('Petal.Width','Species'))

# gives:
      Sepal.Length Sepal.Width Petal.Length Petal.Width Species   uniqueId
1          5.1         3.5          1.4         0.2  setosa 0.2_setosa
2          4.9         3.0          1.4         0.2  setosa 0.2_setosa
...

My initial goal was to make it using dplyr::mutate and despite I read the programming vignette http://127.0.0.1:31671/library/dplyr/doc/programming.html, I did not manage to reach my goal. Because I want to understand the point I missed, I would like to solve the problem using mutate and I would appreciate suggestions.

5

5 Answers

1
votes

The best way to tackle this is to use quasi quotation - this article is really helpful in explaining the fundamentals.

https://dplyr.tidyverse.org/articles/programming.html

Rather than storing the column names as strings, the best option is to store them as quoted strings, thus:

varlist <- rlang::quos('Petal.Width', 'Species')

That line gives you a list of 2 quosures - one containing the column for Petal.Width and one for Species.

You then want to use !!! to append the list of quosures to the dplyr statement (!!! because you're splicing more than one instruction).

dplyr::select(iris, !!! varlist)

Should give you the desired results.

0
votes

Using data table, I do something like this

library(data.table)
iris <- data.table(iris)

iris[, uniqueId := do.call(function(...) paste(..., sep = "_"),.SD), .SDcols = c('Petal.Width','Species')]
0
votes

Check out the unite function in tidyr here. It's part of tidyverse the same group of packages that dplyr is included in.

library(tidyr)
unite(iris,uniqueID,c(Petal.Width,Species))
#    Sepal.Length Sepal.Width Petal.Length       uniqueID
#1            5.1         3.5          1.4     0.2_setosa
#2            4.9         3.0          1.4     0.2_setosa
#3            4.7         3.2          1.3     0.2_setosa
#4            4.6         3.1          1.5     0.2_setosa

If you don't want to lose the two columns you concatenated, just include remove = F

unite(iris,uniqueID,c(Petal.Width,Species),remove = F)
#    Sepal.Length Sepal.Width Petal.Length       uniqueID Petal.Width    Species
#1            5.1         3.5          1.4     0.2_setosa         0.2     setosa
#2            4.9         3.0          1.4     0.2_setosa         0.2     setosa
#3            4.7         3.2          1.3     0.2_setosa         0.2     setosa
#4            4.6         3.1          1.5     0.2_setosa         0.2     setosa
0
votes

To add to the other answers, since you said that you want to do it using dplyr's mutate.

Here is a way to it in mutate, using paste:

iris %>% mutate(uniqueId= paste(Petal.Width, Species, sep = '_'))
# gives the following result:
     Sepal.Length Sepal.Width Petal.Length Petal.Width Species uniqueId
 1          5.1         3.5          1.4         0.2 setosa  0.2_setosa
 2          4.9         3            1.4         0.2 setosa  0.2_setosa
 3          4.7         3.2          1.3         0.2 setosa  0.2_setosa
 4          4.6         3.1          1.5         0.2 setosa  0.2_setosa
 5          5           3.6          1.4         0.2 setosa  0.2_setosa
 6          5.4         3.9          1.7         0.4 setosa  0.4_setosa
 7          4.6         3.4          1.4         0.3 setosa  0.3_setosa
 8          5           3.4          1.5         0.2 setosa  0.2_setosa
 9          4.4         2.9          1.4         0.2 setosa  0.2_setosa
10          4.9         3.1          1.5         0.1 setosa  0.1_setosa
...

If your function is a custom function, you can vectorize it and then use it. For example, this leads to the same result as above:

concat_fields<-function(var1, var2) {
  return (paste(var1, var2, sep = '_'))
}
v_concat_fields <- Vectorize(concat_fields)
iris %>% mutate(v_concat_fields(Petal.Width, Species))

The function that goes into mutate will be applied to columns of the data frame, it has arguments of type vectors, not dataframes.

0
votes

OK, having thought about it here is another solution.

Convert the string names to column numbers by using the match function.

Then use the column numbers like so (replacing the numeric vector in the example with the results of match):

df <- tbl_df(df[c(3, 4, 7, 1, 9, 8, 5, 2, 6, 10)])

This also has the benefit that if match returns any unfound values you can abort the function with an error.