0
votes

Im trying to create a new variable e.g, iris$Sepal.Length_above with numeric and species-dependent classification of a variable e.g., sepal length above (1) or below (0) cut-offs. I'll illustrate using iris.

data("iris")
iris_rm <- subset(iris, Species == 'setosa')
iris_2 <- iris[!(iris$Species %in% iris_rm$Species),] #two species

For variables without species-specific cut-offs Ive used the below line

iris_2$Sepal.Width_above <- ifelse(iris_2$Sepal.Width >= 3.0, 1, 0)#1 is above cut-off

Now I want to do the same, but with species-dependent cut-offs. Assume:

#Species "virginica" has Sepal.Length cut-off: 6.5
#Species "versicolor" has Sepal.Length cut-off: 6.0

The best Ive come up with is the below, but there are two problems.

library(dplyr)
iris_2$Sepal.Length_above  <- if (iris_2$Species == 'virginica'){ 
  ifelse(iris_2$Sepal.Length >= 6.5, 1, 0) 
} else (iris_2$Species =='versicolor'){ 
  ifelse(iris_2$Sepal.Length >= 6.0, 1, 0) 
View(iris_2)
#problem 1: 6.0 seems to override the 6.5 for virginica
#problem 2: >= and <= seems to be switched

I would be so greatful for help!

1

1 Answers

0
votes

Create a cut_off dataset which has species information and it's respective cut off value.

library(dplyr)
cut_off_data <- data.frame(Species = c('virginica', 'versicolor'), 
                           cut_off = c(6.5, 6))

cut_off_data
#     Species cut_off
#1  virginica     6.5
#2 versicolor     6.0

Join it with your data (iris_2) and create a new column with 1 for values above cut off and 0 otherwise.

left_join(iris_2, cut_off_data, by = 'Species') %>%
  mutate(Sepal.Length_above = as.integer(Sepal.Length >=cut_off)) -> result

result

In base R :

result <- transform(merge(iris_2, cut_off_data, by = 'Species', all.x = TRUE), 
                    Sepal.Length_above = as.integer(Sepal.Length >=cut_off))