I would like to label variable values using an already constructed data dictionary.
A minimal dataset:
clear
input q1p0 q1p1 q2p0 q2p1 q2p2 q2p3
1 1 1 1 4 34
1 1 2 2 3 36
1 1 1 4 2 45
1 2 2 4 2 46
1 1 1 3 2 23
1 1 2 4 1 35
1 1 2 2 3 22
1 1 1 2 1 17
1 1 1 4 1 40
1 1 2 3 2 18
1 2 2 2 1 40
end
Naturally, by manually reading from the dictionary I would have done:
label define yesno 1 "Yes" 2 "No"
label values q1p0 q1p1 q2p0 yesno
label define workload 0 "No change" 1 "Very low workload" 2 "Low workload" 3 "More workload" 4 "A lot more workload" 98 "Don’t know"
label values q2p1 q2p2 workload
label define yesnodont 1 "Yes" 2 "No" 98 "Don’t Know"
label values q2p3 yesnodont
However, I have many variables that require labeling and so an automated approach using a loop would be helpful. A minimal dictionary:
clear
input str4 variable str20 valuelabel value
q1p0 "Yes" 1
q1p0 "No" 2
q1p1 "Yes" 1
q1p1 "No" 2
q2p0 "Yes" 1
q2p0 "No" 2
q2p1 "No change" 0
q2p1 "Very low workload" 1
q2p1 "Low workload" 2
q2p1 "More workload" 3
q2p1 "A lot more workload" 4
q2p1 "Don’t know" 98
q2p2 "No change" 0
q2p2 "Very low workload" 1
q2p2 "Low workload" 2
q2p2 "More workload" 3
q2p2 "A lot more workload" 4
q2p2 "Don’t know" 98
q2p3 "Yes" 1
q2p3 "No" 2
q2p3 "Don't know" 98
end
The variables in the data set above represent the variable to be named, the valuelabel to assign to a particular category of the variable and the value of the variable category.
How could one automate the process?
I need to generate:
clear
input str4 variable strL labelstatement
q1p0 `"1 "Yes" 2 "No""'
q1p1 `"1 "Yes" 2 "No""'
q2p0 `"1 "Increased" 2 "Decreased" 3 "No change" 98 "Don’t know""'
q2p1 `"0 "No change" 1 "Very low workload" 2 "Low workload" 3 "More workload" 4 "A lot more workload" 98 "Don’t know""'
q2p2 `"0 "No change" 1 "Very low workload" 2 "Low workload" 3 "More workload" 4 "A lot more workload" 98 "Don’t know""'
q2p3 `"1 "Yes" 2 "No" 98 "Don’t Know""'
end
A related question has been posted before but that one involved labeling variables and not values. Stata: Assign labels to range of variables with a loop
In R I could do this: First create the minimal dictionary:
library(dplyr)
valuelabels <- read.table(text="variable valuelabel value
q1p0 'Yes' 1
q1p0 'No' 2
q1p1 'Yes' 1
q1p1 'No' 2
q2p0 'Yes' 1
q2p0 'No' 2
q2p1 'No change' 0
q2p1 'Very low workload' 1
q2p1 'Low workload' 2
q2p1 'More workload' 3
q2p1 'A lot more workload' 4
q2p1 'Don\\'t know' 98
q2p2 'No change' 0
q2p2 'Very low workload' 1
q2p2 'Low workload' 2
q2p2 'More workload' 3
q2p2 'A lot more workload' 4
q2p2 'Don\\'t know' 98
q2p3 ' Yes' 1
q2p3 'No' 2
q2p3 'Don\\'t know' 98",
header=T, stringsAsFactors=F)
Now create the statements that will finally be executed as Stata code:
valuelabels <- valuelabels %>%
group_by(variable) %>%
mutate(labelstatement=paste(value,'"',valuelabel,'"', collapse=' ', sep=''),
labelstatement= gsub('"',' "', labelstatement),
labelstatement1=paste("label define",variable, labelstatement),
labelstatement2=paste("label values",variable, variable)) %>%
select(variable,labelstatement1,labelstatement2) %>%
slice(1)
which gives:
variable labelstatement1 labelstatement2
q1p0 label define q1p0 1 "Yes " 2 "No " label values q1p0 q1p0
q1p1 label define q1p1 1 "Yes " 2 "No " label values q1p1 q1p1
q2p0 label define q2p0 1 "Yes " 2 "No " label values q2p0 q2p0
concat()block. - Nick Coxconcat()block now removed. - Nick Cox