3
votes

I'm trying to create a new column in a data frame that will be used as part of a model (using the tree() function).

When defining the new column using ifelse(), R returns a warning that NAs were introduced by coercion.

library(ISLR)
library(tree)

Carseats$High <- ifelse(Carseats$Sales <= 8, "No", "Yes")

> tree.carseats <- tree(High~.-Sales, data = Carseats)
Warning message:
In tree(High ~ . - Sales, data = Carseats) : NAs introduced by coercion
> summary(tree.carseats)
Error in y - frame$yval[object$where] : 
  non-numeric argument to binary operator

Meanwhile, the following code using attach() seems to work properly. No NAs are introduced by coercion, and tree() works as intended.

library(ISLR)
library(tree)

attach(Carseats)
High <- ifelse(Sales <= 8, "No", "Yes")
Carseats <- data.frame(Carseats, High)

> tree.carseats <- tree(High~.-Sales, data = Carseats)
> summary(tree.carseats)

Classification tree:
tree(formula = High ~ . - Sales, data = Carseats)
Variables actually used in tree construction:
[1] "ShelveLoc"   "Price"       "Income"      "CompPrice"   "Population" 
[6] "Advertising" "Age"         "US"         
Number of terminal nodes:  27 
Residual mean deviance:  0.4575 = 170.7 / 373 
Misclassification error rate: 0.09 = 36 / 400

There does not appear to be any difference in the two versions of the High column in Carseats before feeding the data into the tree() function.

Could someone please help me understand why I am getting two different results? Is there a way to get this to work using base ifelse() (or without attach())?

2
always: without attach() ! See library("fortunes"); fortune(379)jogo

2 Answers

2
votes

The data.frame() function has the option stringsAsFactors which defaults to stringsAsFactors = default.stringsAsFactors(). The default value of default.stringsAsFactors() is TRUE, therefore your call to data.frame() creates High as a factor, whereas ifelse() creates a character by default.

Just use

Carseats$High <- factor(ifelse(Carseats$Sales <= 8, "No", "Yes"))

to get it to work without attaching the data.

2
votes
library(tree)
library(tidyverse)
data(Carseats, package = "ISLR")
Carseats %>%
    mutate(High = as.factor(Sales <= 8)) -> Carseats

tree(High ~ . - Sales, data = Carseats) -> tsmry
summary(tsmry)
  • No need to load ISLR if you only need data from that package. data function delivers that
  • TRUE / FALSE are better flags than Yes / No think of any potential future control flow. For boolean variable you can do if (bool_var) instead of if (my_flag == "yes")
    • You can then drop ifelse function