Selecting only numeric columns from a data frame

208

votes

Suppose, you have a data.frame like this:

x <- data.frame(v1=1:20,v2=1:20,v3=1:20,v4=letters[1:20])

How would you select only those columns in x that are numeric?

r

322

votes

EDIT: updated to avoid use of ill-advised sapply.

Since a data frame is a list we can use the list-apply functions:

nums <- unlist(lapply(x, is.numeric))

Then standard subsetting

x[ , nums]

## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)

For a more idiomatic modern R I'd now recommend

x[ , purrr::map_lgl(x, is.numeric)]

Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:

dplyr::select_if(x, is.numeric)

Newer versions of dplyr, also support the following syntax:

x %>% dplyr::select(where(is.numeric))

88

votes

The dplyr package's select_if() function is an elegant solution:

library("dplyr")
select_if(x, is.numeric)

48

votes

Filter() from the base package is the perfect function for that use-case: You simply have to code:

Filter(is.numeric, x)

It is also much faster than select_if():

library(microbenchmark)
microbenchmark(
    dplyr::select_if(mtcars, is.numeric),
    Filter(is.numeric, mtcars)
)

returns (on my computer) a median of 60 microseconds for Filter, and 21 000 microseconds for select_if (350x faster).

9

votes

in case you are interested only in column names then use this :

names(dplyr::select_if(train,is.numeric))

8

votes

This an alternate code to other answers:

x[, sapply(x, class) == "numeric"]

with a data.table

x[, lapply(x, is.numeric) == TRUE, with = FALSE]

4

votes

iris %>% dplyr::select(where(is.numeric)) #as per most recent updates

Another option with purrr would be to negate discard function:

iris %>% purrr::discard(~!is.numeric(.))

If you want the names of the numeric columns, you can add names or colnames:

iris %>% purrr::discard(~!is.numeric(.)) %>% names

3

votes

The library PCAmixdata has functon splitmix that splits quantitative(Numerical data) and qualitative (Categorical data) of a given dataframe "YourDataframe" as shown below:

install.packages("PCAmixdata")
library(PCAmixdata)
split <- splitmix(YourDataframe)
X1 <- split$X.quanti(Gives numerical columns in the dataset) 
X2 <- split$X.quali (Gives categorical columns in the dataset)

3

votes

library(purrr)
x <- x %>% keep(is.numeric)

1

votes

If you have many factor variables, you can use select_if funtion. install the dplyr packages. There are many function that separates data by satisfying a condition. you can set the conditions.

Use like this.

categorical<-select_if(df,is.factor)
str(categorical)

1

votes

Another way could be as follows:-

#extracting numeric columns from iris datset
(iris[sapply(iris, is.numeric)])

0

votes

Numerical_variables <- which(sapply(df, is.numeric))
# then extract column names 
Names <- names(Numerical_variables)

-1

votes

This doesn't directly answer the question but can be very useful, especially if you want something like all the numeric columns except for your id column and dependent variable.

numeric_cols <- sapply(dataframe, is.numeric) %>% which %>% 
                   names %>% setdiff(., c("id_variable", "dep_var"))

dataframe %<>% dplyr::mutate_at(numeric_cols, function(x) your_function(x))

Selecting only numeric columns from a data frame

12 Answers