2
votes

I am looking to create a correlation matrix using cor() on a data set named "flights" which contains both numeric and non-numeric data. I have partitioned the data using createDataPartition().

# create a data partition
flights_sampling_vector <- caret::createDataPartition(flights$delay, p = 0.8, list = FALSE, times = 1)
flights_train <- flights[flights_sampling_vector]
flights_test <- flights[-flights_sampling_vector]

flights_matrix=cor(flights, y=NULL)

Error in cor(flights, y = NULL) : 'x' must be numeric

My principle problem is that the cor() function does not allow numeric data.

How can I create a correlation matrix with data that contains both numeric and non-numeric data?

1
cor() is used to find correlation efficient between numeric variables. If you got non-numeric data(such as different categories, groupA,groupB, groupC..., or logical data, True or False) it might be better to conduct ANOVA among the groups(or t test between the groups).Xi wa

1 Answers

0
votes

I would check out dplyr::select_if() to subset the numeric columns and then calculate the correlation matrix for those columns.

library(tidyverse)
library(caret)
flights_matrix <- flights %>%
    select_if(is.numeric) %>%
    cor(.)