I'm working in R and have a dataframe, dd_2006, with numeric vectors. When I first imported the data, I needed to remove $'s, decimal points, and some blank spaces from 3 of my variables: SumOfCost, SumOfCases, and SumOfUnits. To do that, I used str_replace_all
. However, once I used str_replace_all
, the vectors were converted to characters. So I used as.numeric(var) to convert the vectors to numeric, but NAs were introduced, even though when I ran the code below BEFORE I ran the as.numeric code, there were no NAs in the vectors.
sum(is.na(dd_2006$SumOfCost))
[1] 0
sum(is.na(dd_2006$SumOfCases))
[1] 0
sum(is.na(dd_2006$SumOfUnits))
[1] 0
Here is my code from after the import, beginning with removing the $ from the vector. In the str(dd_2006)
output, I deleted some of the variables for the sake of space, so the column #s in the str_replace_all
code below don't match the output I've posted here (but they do in the original code):
library("stringr")
dd_2006$SumOfCost <- str_sub(dd_2006$SumOfCost, 2, ) #2=the first # after the $
#Removes decimal pt, zero's after, and commas
dd_2006[ ,9] <- str_replace_all(dd_2006[ ,9], ".00", "")
dd_2006[,9] <- str_replace_all(dd_2006[,9], ",", "")
dd_2006[ ,10] <- str_replace_all(dd_2006[ ,10], ".00", "")
dd_2006[ ,10] <- str_replace_all(dd_2006[,10], ",", "")
dd_2006[ ,11] <- str_replace_all(dd_2006[ ,11], ".00", "")
dd_2006[,11] <- str_replace_all(dd_2006[,11], ",", "")
str(dd_2006)
'data.frame': 12604 obs. of 14 variables:
$ CMHSP : Factor w/ 46 levels "Allegan","AuSable Valley",..: 1 1 1
$ FY : Factor w/ 1 level "2006": 1 1 1 1 1 1 1 1 1 1 ...
$ Population : Factor w/ 1 level "DD": 1 1 1 1 1 1 1 1 1 1 ...
$ SumOfCases : chr "0" "1" "0" "0" ...
$ SumOfUnits : chr "0" "365" "0" "0" ...
$ SumOfCost : chr "0" "96416" "0" "0" ...
I found a response to a similar question to mine here, using the following code:
# create dummy data.frame
d <- data.frame(char = letters[1:5],
fake_char = as.character(1:5),
fac = factor(1:5),
char_fac = factor(letters[1:5]),
num = 1:5, stringsAsFactors = FALSE)
Let us have a glance at data.frame
> d
char fake_char fac char_fac num
1 a 1 1 a 1
2 b 2 2 b 2
3 c 3 3 c 3
4 d 4 4 d 4
5 e 5 5 e 5
and let us run:
> sapply(d, mode)
char fake_char fac char_fac num
"character" "character" "numeric" "numeric" "numeric"
> sapply(d, class)
char fake_char fac char_fac num
"character" "character" "factor" "factor" "integer"
Now you probably ask yourself "Where's an anomaly?" Well, I've bumped into quite peculiar things in R, and this is not the most confounding thing, but it can confuse you, especially if you read this before rolling into bed.
Here goes: first two columns are character. I've deliberately called 2nd one fake_char. Spot the similarity of this character variable with one that Dirk created in his reply. It's actually a numerical vector converted to character. 3rd and 4th column are factor, and the last one is "purely" numeric.
If you utilize transform function, you can convert the fake_char into numeric, but not the char variable itself.
> transform(d, char = as.numeric(char))
char fake_char fac char_fac num
1 NA 1 1 a 1
2 NA 2 2 b 2
3 NA 3 3 c 3
4 NA 4 4 d 4
5 NA 5 5 e 5
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion
but if you do same thing on fake_char and char_fac, you'll be lucky, and get away with no NA's:
transform(d, fake_char = as.numeric(fake_char), char_fac = as.numeric(char_fac))
char fake_char fac char_fac num
1 a 1 1 1 1
2 b 2 2 2 2
3 c 3 3 3 3
4 d 4 4 4 4
5 e 5 5 5 5
So I tried the above code in my script, but still came up with NAs (without a warning message about coercion).
#changing sumofcases, cost, and units to numeric
dd_2006_1 <- transform(dd_2006, SumOfCases = as.numeric(SumOfCases), SumOfUnits = as.numeric(SumOfUnits), SumOfCost = as.numeric(SumOfCost))
> sum(is.na(dd_2006_1$SumOfCost))
[1] 12
> sum(is.na(dd_2006_1$SumOfCases))
[1] 7
> sum(is.na(dd_2006_1$SumOfUnits))
[1] 11
I've also used table(dd_2006$SumOfCases)
etc. to look at the observations to see if there are any characters that I missed in the observations, but there weren't any. Any thoughts on why the NAs are popping up, and how to get rid of them?