0
votes

[1]In my dataset I have responses to a yes/no question, with a lot of missing values.

The column for the question looks something like this:

Question
[1] yes
[2] no
[3] 
[4] yes
[5] no
[6] 

In other words:

summary(Question)

173

yes

160

no

155

where we have 173 missing values, 160 yes answers, and 155 no's.

When I look at levels in the factor, I get the following:

levels(Question)
[1] " "
[2] yes
[3] no

I would like to drop the missing values (that is, level " ") (and have legitimate reasons to exclude missing values in this case).

However, is.na(Question) reports (implausibly) that there are no missing values, so I cannot easily exclude them.

I have tried dropping the level with missing values:

droplevels.factor(Question, exclude={" "}

but it results in a "NAs introduced by coercion" warning message.

What can I do to exclude the level with missing values? Please help. Thank you.

Edited with link to data file.

1
is.na only looks for the magic NA value (which is different than a level value). Your "missing" values seem to be a string with a single space in it. So maybe you want Question[Question != " "] possibly followed by droplevels() but it's not totally clear what you are trying to do or if you have any actual NA values. When asking for help, you should include a simple reproducible example with sample input and desired output that can be used to test and verify possible solutions. - MrFlick
Try Question[Question==" "] = NA to convert them to NA - R. Schifini
Thank you@MrFlick and @R. Schifini. You'd think that either option would work, but they don't seem to do anything. E.g. Question[Question==" "] = NA doesn't actually convert the blank (" ") values to NA. Baffling. I've edited the question with a link to an example (I hope that's how it's done). - KaC
In case anyone stumbles on this in the future, the solution has turned out to be the following: - Converting " " to NA didn't work, so I downloaded a copy of the dataset with missing values marked as -99. - Then Question[Question=="-99"] = NA worked. - That correctly marked the missing values as NA, but the level in the factor renamed. - I dropped it using Question <- factor(Question) - KaC

1 Answers

0
votes

you can use scan

  scan(text=Questions,what="character",quiet=TRUE)