0
votes

I would like to label variable values using an already constructed data dictionary. A minimal dataset:

clear
input q1p0  q1p1    q2p0    q2p1    q2p2    q2p3
1   1   1   1   4   34
1   1   2   2   3   36
1   1   1   4   2   45
1   2   2   4   2   46
1   1   1   3   2   23
1   1   2   4   1   35
1   1   2   2   3   22
1   1   1   2   1   17
1   1   1   4   1   40
1   1   2   3   2   18
1   2   2   2   1   40
end

Naturally, by manually reading from the dictionary I would have done:

label define yesno 1 "Yes" 2 "No"
label values q1p0 q1p1 q2p0 yesno

label define workload 0 "No change" 1 "Very low workload" 2 "Low workload" 3 "More workload" 4 "A lot more workload" 98 "Don’t know"    
label values q2p1 q2p2 workload

label define yesnodont 1 "Yes" 2 "No" 98 "Don’t Know"   
label values q2p3 yesnodont

However, I have many variables that require labeling and so an automated approach using a loop would be helpful. A minimal dictionary:

clear
input str4 variable  str20 valuelabel value
q1p0    "Yes"   1
q1p0    "No"    2
q1p1    "Yes"   1
q1p1    "No"    2
q2p0    "Yes"   1
q2p0    "No"    2
q2p1    "No change" 0
q2p1    "Very low workload" 1
q2p1    "Low workload"  2
q2p1    "More workload" 3
q2p1    "A lot more workload"   4
q2p1    "Don’t know"    98
q2p2    "No change" 0
q2p2    "Very low workload" 1
q2p2    "Low workload"  2
q2p2    "More workload" 3
q2p2    "A lot more workload"   4
q2p2    "Don’t know"    98
q2p3    "Yes"   1
q2p3    "No"    2
q2p3    "Don't know"    98
end

The variables in the data set above represent the variable to be named, the valuelabel to assign to a particular category of the variable and the value of the variable category.

How could one automate the process?

I need to generate:

clear
input str4 variable  strL labelstatement
q1p0    `"1 "Yes" 2 "No""'
q1p1    `"1 "Yes" 2 "No""'      
q2p0    `"1 "Increased" 2 "Decreased" 3 "No change" 98 "Don’t know""'   
q2p1    `"0 "No change" 1 "Very low workload" 2 "Low workload" 3 "More workload" 4 "A lot more workload" 98 "Don’t know""'  
q2p2    `"0 "No change" 1 "Very low workload" 2 "Low workload" 3 "More workload" 4 "A lot more workload" 98 "Don’t know""'  
q2p3    `"1 "Yes" 2 "No" 98 "Don’t Know""'      
end

A related question has been posted before but that one involved labeling variables and not values. Stata: Assign labels to range of variables with a loop

In R I could do this: First create the minimal dictionary:

library(dplyr)

valuelabels <- read.table(text="variable valuelabel value
q1p0                 'Yes'     1
q1p0                  'No'     2
q1p1                 'Yes'     1
q1p1                  'No'     2
q2p0                 'Yes'     1
q2p0                  'No'     2
q2p1           'No change'     0
q2p1  'Very low workload'     1
q2p1        'Low workload'     2
q2p1       'More workload'     3
q2p1 'A lot more workload'     4
q2p1          'Don\\'t know'    98
q2p2           'No change'     0
q2p2   'Very low workload'     1
q2p2        'Low workload'     2
q2p2       'More workload'     3
q2p2 'A lot more workload'     4
q2p2          'Don\\'t know'    98
q2p3                ' Yes'     1
q2p3                  'No'     2
q2p3          'Don\\'t know'    98", 
                 header=T, stringsAsFactors=F)

Now create the statements that will finally be executed as Stata code:

valuelabels <- valuelabels %>%
  group_by(variable) %>%
  mutate(labelstatement=paste(value,'"',valuelabel,'"', collapse=' ', sep=''),
         labelstatement= gsub('"',' "', labelstatement),
         labelstatement1=paste("label define",variable, labelstatement),
         labelstatement2=paste("label values",variable, variable)) %>%
  select(variable,labelstatement1,labelstatement2) %>%
  slice(1)

which gives:

variable    labelstatement1 labelstatement2
q1p0    label define q1p0 1 "Yes " 2 "No "  label values q1p0 q1p0
q1p1    label define q1p1 1 "Yes " 2 "No "  label values q1p1 q1p1
q2p0    label define q2p0 1 "Yes " 2 "No "  label values q2p0 q2p0
2
The question is morphing on each edit. That's entirely allowed, but it makes it difficult to follow. Two specific comments: 1. The last block of output isn't legal Stata code. I don't know if it is intended to be. 2. Same comment on the concat() block. - Nick Cox
concat() block now removed. - Nick Cox

2 Answers

0
votes

It's a case of "I wouldn't start from there". But given that you have, suppose you read in those details as data. Then the single command

gen line = "label define " + variable + " " + string(value) + " " + char(34) + valuelabel + char(34) + ", modify" 

would create a variable line with contents fit to be exported and executed as a do-file.

Here char(34) is the " character. There are other ways to ensure that literal quote marks get added, but that is fairly low risk.

Otherwise put, you have the ingredients there for a do-file. You just need to add some text and re-order.

You can mix instructions like this and real data, insofar as any text could go in a string variable. But it's a matter of style whether you do. (I'd usually reach for any favourite text editor.)

2
votes

Let's create a dataset with labels in the way you presented them. First, let's assume, to make it a bit easier to follow, that we are talking about a country. Let's assume the country has 4 states, each with 4 districts, each with 4 towns.

This gives us 1 + 4 + 4^2 + 4^3 = 85 objects.

clear
set more off
set obs 85

gen name=""
replace name = "country" in 1
replace name = "state" in 2/5
replace name = "district" in 6/21
replace name = "town" in 22/85

bysort name: gen value = _n
gen label = name + strofreal(value)

Great! Now we have our label names, values and the labels themselves. Let's go ahead and save this as a .dta file to open later. You wanted to reference a .csv file, but its all the same. You would just need to use import delimited instead of use later on.

tempfile labels
save `labels'

Note that you need to have this all in one .do file, since I am referencing temporary files. I would suggest copy/pasting this into a .do file rather than inputting it directly line-by-line.

Now we need to create a sample dataset that would need labels, with the variables country, state, district, and town.

clear
set more off
set obs 1

foreach x in country state district town {
    gen `x' = _n
    expand 4
    sort `x'
}
duplicates drop town, force

Now, there is one unique country, 4 unique states, 16 unique districts, and 64 unique towns. They are all in terms of an integer variable with no current label.

Now we will create a loop that preserves this data, then references the temporary file with the labels we want for each variable-value.

foreach x in country state district town {
    tempfile `x'do                              
    preserve                                        
        use `labels', clear                             // Reference the "labels" tempfile we made earlier
        keep if name == "`x'"                           // Keep only the rows where name is the name of the label we want to define
        qui sum value                                   // Allows me to reference r(N) later, the number of rows in the dataset
        forvalues i = 1/`r(N)' {                        // Inner loop generates value/label pair for each unique value of x
            local value = value[`i']                    // Pulls out the value for row 1, 2, 3,... etc.
            local label = label[`i']                    // Pulls out the label string for row 1, 2, 3,.. etc.
            label define `x' `value' "`label'", add     // Define and continuously add to label definition each time it loops
        }                                               // End inner loop
        label save `x' using ``x'do', replace           // Saves label instructions in temporary do file so we can access it after "restore"
    restore
    do ``x'do'                                          // Re-create the labels in our main dataset (they were lost after "restore")
    label values `x' "`x'"                              // Apply label to values                                
}                                                       // End outer loop

And voila! Labels for each variable have been generated and applied. The answer seems long, but remember that we had to create mock-data first. The loop is all you actually need if you have your labels defined such that the "name" in the labels file is the same as the variable name you want to assign that value to.