If I read in a Stata or SAS dataset with labels using haven
, it will be (at least in haven
0.2.0) read with the following format:
library(dplyr)
df1 <- data_frame(fips = structure(c(1001, 1001, 1001, 1001, 1001),
label = "FIPS (numeric)"),
id = structure(letters[1:5], label = "ID"))
df2 <- data_frame(fips = structure(c(1001, 1003, 1005, 1007, 1009),
label = "FIPS (numeric)"),
state = structure("AL", label = "State Abbreviation"))
(If necessary, I can post some Stata data that produces this, but this should be easy to verify using any labeled Stata/SAS dataset.)
When I try to use any of the dplyr
join functions to join on a labeled column, I am sorely disappointed:
df1 %>% inner_join(df2)
returns the error
Error in eval(expr, envir, enclos) : cannot join on columns 'fips' x 'fips': Can't join on 'fips' x 'fips' because of incompatible types (numeric / numeric)
The only way to avoid it seems to be to remove the labels on the join variables:
df1 %>%
mutate(fips = `attr<-`(fips, 'label', NULL)) %>%
inner_join(df2 %>% mutate(fips = `attr<-`(fips, 'label', NULL)))
which raises the question of why the labels were read in the first place. (The join also obliterates the labels in df2
.)
This would seem to be a bug in the way haven
and dplyr
interact. Is there a better solution?
dplyr
interacts with structures, given that it's reproducable without usinghaven
? – hoyland