I am forming a data.frame from character data that is not under my control (from an API). I would like the resulting variables to get their most natural class with minimal fuss. Specifically, I want integer variables, not numeric, when appropriate.
I am digging this data out of XML and one attribute -- let's call it
attA
-- presents integers as integers, i.e. with no period and
trailing zero. Another attribute -- let's call it attB
-- is more
generally useful and correct, but always presents numbers with one
decimal place, even if that is uniformly zero. (The data could also be character, mind you!)
My initial approach was based on attA
and processing through
type.convert()
but now I want to use attB
. From reading the
type.convert()
docs, I'm surprised it does not produce integers when
all the data could be represented as integer. Am I misreading that? Any
suggestions on how to get what I want without doing some unholy
processing of the character data?
attA <- c("1", "2")
str(type.convert(attA))
#> int [1:2] 1 2
attB <- c("1.0", "2.0")
str(type.convert(attB))
#> num [1:2] 1 2
unholy <- gsub("\\.0$", "", attB)
str(type.convert(unholy))
#> int [1:2] 1 2
Relevant bit of type.convert()
docs: "Given a character vector, it
attempts to convert it to logical, integer, numeric or complex, and
failing that converts it to factor unless as.is = TRUE. The first type
that can accept all the non-missing values is chosen... Vectors
containing optional whitespace followed by decimal constants
representable as R integers or values from na.strings are converted to
integer."
type.convert()
withas.integer()
?as.integer(attB)
works well. Alsoread.table()
could possibly be used, and you can specifycolClasses
there. – Rich Scriventype.convert(..., as.is = FALSE)
). That's why I can't useas.integer()
. – jennybryan