0
votes
  1. I loaded my dataset (original.csv) to R: original <- read.csv("original.csv")
  2. str(original) showed that my dataset has 16 variables (14 factors, 2 integers). 14 variables have missing values. It was OK, but 3 variables that are originally numbers, are known as factors.
  3. I searched web and get a command as: as.numeric(as.character(original$Tumor_Size)) (Tumor_Size is a variable that has been known as factor).
  4. By the way, missing values in my dataset are marked as dot (.)
  5. After running: as.numeric(as.character(original$Tumor_Size)), the values of Tumor_Size were listed and in the end a warning massage as: “NAs introduced by coercion” was appeared.
  6. I expected after running above command, the variable converted to numeric, but second str(original) showed that my guess was wrong and Tumor_Size and another two variables were factors. In the below is sample of my dataset: a piece of my dataset

How can I solve my problem?

2

2 Answers

6
votes

The crucial information here is how missing values are encoded in your data file. The corresponding argument in read.csv() is called na.strings. So if dots are used:

original <- read.csv("original.csv", na.strings = ".")
0
votes

I'm not 100% sure what your problem is but maybe this will help....

original<-read.csv("original.csv",header = TRUE,stringsAsFactors = FALSE)
original$Tumor_Size<-as.numeric(original$Tumor_Size)

This will introduce NA's because it cannot convert your dot(.) to a numeric value. If you try to replace the NA's with a dot again it will return the field as a character, to do this you can use,

original$Tumor_Size[is.na(original$Tumor_Size)]<-"."

Hope this helps.