6
votes

I want to make a subset of my data, by selecting columns, like below:

select(df, col1, col2, col3, col4) 

But sometimes I have a slightly different data set, with only col1, col2 and col4.

How can I use select(), and If a column doesn't exist, it just continues without giving an error?

So it would give a dataset with col1, col2 and col4 (and skip col3). If I just run the above select() line, I get this error:

Error in overscope_eval_next(overscope, expr) : object 'col3' not found
2

2 Answers

8
votes
df[, names(df) %in% c('col1', 'col2', 'col3', 'col4')]
5
votes

You can use the one_of() select helper from dplyr and pass the column names as strings. It will just issue a warning for columns that don't exist.


library(dplyr)

select(mtcars, one_of(c("mpg", "disp", "foo")))

#> Warning: Unknown variables: `foo`

#>                      mpg  disp
#> Mazda RX4           21.0 160.0
#> Mazda RX4 Wag       21.0 160.0
#> Datsun 710          22.8 108.0
#> Hornet 4 Drive      21.4 258.0
#> Hornet Sportabout   18.7 360.0