I am quite new to R
and more used to Stata
.
I managed to read a database from Stata
to a R
data.frame
using library(foreign)
.
data=read.dta("mydata.dta",
convert.dates = TRUE,
convert.factors = TRUE,
missing.type = FALSE,
convert.underscore = FALSE,
warn.missing.labels = TRUE)
Values (in the sens of Stata language) are however not imported, only labels are imported.
Let me explain it a little more. Assume I want to manipulate an education variable called "edu". In Stata language, I use numeric values instead of labels to manipulate my variable and the data editor shows the labels, so long as I have defined my labels. Assume for instance that my variable "edu" takes the values 10 to 40, the following code associates a label to each value:
label define lib_edu
10 "Less than high-school degree"
20 "12th grade or higher, no college degree"
30 "Undergraduate level (2 to 4 years of college)"
40 "Graduate level (5 years of college or more)", add;
label values edu lib_edu;
Then, when I want to manipulate my variable, I need to use the values. For example if I want to drop from my dataset people whose label is less than high-school degree, I simply do:
drop if edu==10
But in my imported R
data.frame
, the labels are being imported as factors. To each factor is associated a level which does not necessarily correspond to my Stata values since it restarts from 1. Meanwhile, I cannot use levels to manipulate my variable. If I want to drop from my dataset people whose label is less than high-school degree, I have to write the entire label:
data <- data[data$edu!="Less than high-school degree",]
which is not convenient at all, especially when the label is long and complex.
Is it possible to do as in Stata, that is: manipulate numeric values while editing the data.frame with labels, given that my data are exported from Stata?
Thanking you in advance.
f = factor(c("a","b")); f[ labels(f)[f] != 1 ]
(excluding "a", which has a code of 1). Personally, I map the long labels to abbreviations and work with those ("none", "hs", "ug", "g") – Frankf = factor(c("hs", "hs", "none", "g"), levels=c("none","hs","ug","g"), ordered=TRUE); f[ f >= "ug" ]
– Frank