I am trying to loop over specific numeric columns from dataframe, the goal is to extract correlations and p-values using "cor.test" function.
The correlation consists in calculate the linear relationship of one categorical variable composed of 0 and 1 values against each specific numeric column.
Here's my code so far:
## data ##
names <- c("John", "Greg", "Maria", "Josh", "Emma")
categorical_column <- sample(0:1, 5, replace = TRUE)
numeric_column_1 <- sample(1:30, 5, replace = TRUE)
numeric_column_2 <- sample(1:40, 5, replace = TRUE)
sampled_df <- data.frame(names, categorical_column, numeric_column_1,
numeric_column_2)
## specific columns ##
numerical_columns <- c("numeric_column_1", "numeric_column_2")
## for-loop task ##
for(i in seq_along(numerical_columns)){
correlation_num_df <- structure(list(
variable <- numerical_columns,
correlation <- cor.test(sampled_df[numerical_columns[i]][[i]],
sampled_df[["categorical_column"]])[["estimate"]][["cor"]],
p_value <- cor.test(sampled_df[numerical_columns[i]][[i]],
sampled_df[["categorical_column"]])[["p.value"]]
),
class = "data.frame",
nrow = c(NA, -2L))
}
Console output:
Error in .subset2(x, i, exact = exact) : subscript out of bounds
How could I know the subset that is out of bounds? And how could I fix it?