This is one of the innovations addressed in rio (full disclosure: I wrote this package). Basically, it provides various ways of importing variable labels, including haven's way of doing things and foreign's. Here's a trivial example:
Start by making a reproducible example:
> library("rio")
> export(iris, "iris.dta")
Import using foreign::read.dta() (via rio::import()):
> str(import("iris.dta", haven = FALSE))
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "datalabel")= chr ""
- attr(*, "time.stamp")= chr "15 Jan 2016 20:05"
- attr(*, "formats")= chr "" "" "" "" ...
- attr(*, "types")= int 255 255 255 255 253
- attr(*, "val.labels")= chr "" "" "" "" ...
- attr(*, "var.labels")= chr "" "" "" "" ...
- attr(*, "version")= int -7
- attr(*, "label.table")=List of 1
..$ Species: Named int 1 2 3
.. ..- attr(*, "names")= chr "setosa" "versicolor" "virginica"
Read in using haven::read_dta() using its native variable attributes because the attributes are stored at the data.frame level rather than the variable level:
> str(import("iris.dta", haven = TRUE, column.labels = TRUE))
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species :Class 'labelled' atomic [1:150] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "labels")= Named int [1:3] 1 2 3
.. .. ..- attr(*, "names")= chr [1:3] "setosa" "versicolor" "virginica"
Read in using haven::read_dta() using an alternative that we (the rio developers) have found more convenient:
> str(import("iris.dta", haven = TRUE))
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "var.labels")=List of 5
..$ Sepal.Length: NULL
..$ Sepal.Width : NULL
..$ Petal.Length: NULL
..$ Petal.Width : NULL
..$ Species : NULL
- attr(*, "label.table")=List of 5
..$ Sepal.Length: NULL
..$ Sepal.Width : NULL
..$ Petal.Length: NULL
..$ Petal.Width : NULL
..$ Species : Named int 1 2 3
.. ..- attr(*, "names")= chr "setosa" "versicolor" "virginica"
By moving the attributes to be at the level of the data.frame, they're much easier to access using attr(data, "label.var"), etc. rather than digging through each variable's attributes.
Note: the values of attributes will be NULL because I'm just writing a native R dataset to a local file in order to make this reproducible.
lbs <- setNames(_labels_, names(df)); then accessing the label can be done via, e.g.,lbs["var"]- MichaelChiricoread.dtafunction in pkg:foreign.havenis a relatively recent package and at the moment it doesn't seem to have documented plans for labels. - IRTFMread_dtainhavendoes have label. In contrast,foreign::read.dtaactually doesn't. Also, theforeignpackages does not work with Stata 13, let alone 14. - Heisenbergread.dtasays the value will be:"A data frame with attributes. These will include "datalabel", "time.stamp", "formats", "types", "val.labels", "var.labels" and "version" and may include "label.table" and "expansion.table". - IRTFMforeign. Glancing through theforeigndoc, they suggest the readstata13 package for later versions of Stata. Presumably it also conforms to whatever idiom/norm is found in foreign. - Frank