1
votes

I have imported a Stata dta file into R using readstata13 package.

The variables have notes which contain full length of questions. I found the attr() function with which I can do a few things such as extract variable names (attr(df, name)), extract variable labels (attr(df, "var")), and label values (attr(df, "label")). However, I have not found a way to extract variable notes.

Is there a way to do so?

Below are a few lines of Stata code that produce a dta file with two variables and variable notes, which can be imported into R.

clear
input int(mpg weight)
34 1800
18 3670
21 4060
15 3720
19 3400
41 2040
25 1990
28 3260
30 1980
12 4720
end
note mpg: Mileage (mpg)
note weight: Weight (lbs.)
save "~/mpg_weight.dta", replace
1

1 Answers

2
votes

EDIT:

You can actually do this directly in newer versions of readstata13() as follows:

df = read.dta13("~/mpg_weight.dta")
notes = attr(df, "expansion.fields")

This will generate a list providing variable name, characteristic name and the contents of the Stata characteristic field.


Here's a quick workaround using your toy example:

clear

input int(mpg weight)
34 1800
18 3670
21 4060
15 3720
19 3400
41 2040
25 1990
28 3260
30 1980
12 4720
end

note mpg: this is the first note
note mpg: and this is the second
note mpg: here's a third
note weight: Weight (lbs.)
save "~/mpg_weight.dta", replace

ds
local varlist `r(varlist)'

foreach var of local varlist {
    generate notes_`var' = ""
    forvalues i = 1 / ``var'[note0]' {
        replace notes_`var' = "``var'[note`i']'" in `i'
    }
}

export delimited notes_* using notes_mpg_weight.dta.csv, replace

You can then simply import everything in R as strings and go from there.