1
votes

I was working with the inbuilt data set (ggplot2), named diamonds. After having assigned to dia1, I performed the following script, so as to group the values according to their carats. I got a message regarding NAs being introduced by coercion. I fail to understand how that would have happened, as is also apparent by the sum of the is.na() vector equating to zero.

#data
library(ggplot2)
dia1 <- diamonds

#logic
x<-1
dia1$carat<-as.character(dia1$carat)
for (i in 1:(length(dia1$carat))){

if (0<(as.numeric(dia1$carat[x]))&(as.numeric(dia1$carat[x]))<=1){
  dia1$carat[x]<-"0-1"
}
if (1 < (as.numeric(dia1$carat[x]))&(as.numeric(dia1$carat[x])) <= 2){
  dia1$carat[x]<-"1-2"
}
if (2<(as.numeric(dia1$carat[x]))&(as.numeric(dia1$carat[x]))<=3){
  dia1$carat[x]<-"2-3"
}
if (3<(as.numeric(dia1$carat[x]))&(as.numeric(dia1$carat[x]))<=4){
  dia1$carat[x]<-"3-4"
}
if (4<(as.numeric(dia1$carat[x]))&(as.numeric(dia1$carat[x]))<=5){
  dia1$carat[x]<-"4-5"
}
  x<-x+1
}

Error in if (0 < (as.numeric(dia1$carat[x])) & (as.numeric(dia1$carat[x])) < : missing value where TRUE/FALSE needed In addition: Warning messages: 1: NAs introduced by coercion 2: NAs introduced by coercion

# check if there are any NAs in the data
sum(is.na(dia1$carat))
[1] 0

Alternatively, why were there no NAs introduced when the dia1$carat vector was explicitly coerced to a character, but there were NAs introduced in the back transformation?

1
cut(dia$carat, breaks=0:6, labels=c("0-1", "1-2", "2-3", "3-4", "4-5", "5-6"), include.lowest=TRUE) will probably be a great deal more efficient. And, you should probably do a range(dia$carat) to validate your assumptions.hrbrmstr
@hrbrmstr Thanks. Will use. But as a learning objective, I wanted to know what went wrong with my code.stochastic13
As I said, you might want to do a range(dia$carat) to validate your assumptions. And for + if == python/C/Java, not R.hrbrmstr
It is 0.2, 5.01. Before coercing the vector to character, that is.stochastic13
@hrbrmstr What would then be the equivalent of for+if in R, in case there is no function (like cut here) to do the operation?stochastic13

1 Answers

2
votes

The problem is we are supplying NA to if(), try this example:

if(NA > 1){1} else {2}

Error in if (NA > 1) { : missing value where TRUE/FALSE needed

In your case, if we consider 1st row carat is "0.23", first if() statement is evaluating it correctly after converting it to number, then if within range assigning a new value of "0-1". Then second if() is trying to convert "0-1" to numeric, so we are getting NA.

Other advice about the code:

  • Just use cut()
  • Use if(){...} else if(){...} ...
  • I don't get why we have x variable, we could use i instead.
  • Read about seq_along(), summary(), str()
  • Use temporary variable for as.numeric(dia1$carat[x])
  • Use whitespaces