0
votes

Using the dplyr package in R, I'm trying to make a categorical variable from 3 levels to only 2. I'm using the famous iris data set and trying to turn the class variable (containing: "Iris-versicolor", "Iris-setosa", & "Iris-virginica") into one with only two levels (containing: "Iris-versicolor", "Iris-setosa"). So, I want to create a new data set with I've come up with this:

IRIS_TEST2 <- IRIS_TEST %>%
   filter(class != "Iris-virginica")

So, when I try to run a hypothesis test on it:

inference(y = sepal_length, x = class, data = IRIS_TEST2, statistic = "mean", type = 
      "ci", method = "theoretical", conf_level = .95)

I continue to get an error:

Error: Categorical variable has more than 2 levels, confidence interval is undefined,
         use ANOVA to test for a difference between means

Alternatively, I could use a way to append the "x =" to include only "Iris-versicolor" & "Iris-setosa"

inference(y = sepal_length, x = class, data = IRIS_TEST2, statistic = "mean", type = 
        "ci", method = "theoretical", conf_level = .95)

Any help would be greatly appreciated!

1
Have you considered using droplevels? Just in case, see this question.jazzurro
Yes - you nailed it, thanks!min7b5_b9_b11_b13

1 Answers

0
votes

After filtering out the class I did not want (and storing it into a new variable), I was able to run this code:

IRIS_TEST2$class <- factor(IRIS_TEST2$class)

This allowed me to only have two levels, and I was able to run my hypothesis test and find the confidence interval