3
votes

I'm trying to write a foreach loop in Stata that automatically replaces all "n.a." values that I have on my database with missing values (".a"). I have the following code:

foreach var of varlist `allvar' {
    replace `var' = ".a" if `var' == "n.a."
}

I defined my varlist with all variables in my dataset, but I get a "type mismatch" error. I tried the replace command with just one variable and I notice that the error is related to numeric variables. Any idea how can I make this work with foreach to all variables? Or do I need to select just string variables?

2
Note that Stata doesn't attach any special meaning to the string ".a" It does attach special meaning to the numeric value .a. Although the thread is about how to do it, replacing a string value with an easily explained meaning with another that needs to be explained is not an obviously helpful step.Nick Cox

2 Answers

4
votes

The code indeed only applies to string variables. Consider the line

replace `var' = ".a" if `var' == "n.a."

The type mismatch error is returned under two conditions in this line: from trying to assign the string ".a" (which is different from the missing value .a) to a numeric variable, and trying to examine if a numeric variable equates the string "n.a.".

The following code avoids the problem by only choosing string variables (Thanks to Nick Cox for suggestions to simplify).

ds,has (type string)
foreach var in `r(varlist)' {
    replace `var' = ".a" if `var' == "n.a."
}

This useful warning is from Nick Cox: "Also, watch out when using equals signs to assign a string to a local macro. In many versions of Stata, the string would be truncated because of a limit on the length of string expressions."

Added after reading Roberto's answer

Roberto's examples are really useful. Particularly, I haven't thought of the issue of "n.a." as value labels. I expanded on one of the example and for a code that replaces values labeled as "n.a." with the missing value .a.

clear
input x y
3 2
1 3
2 3
3 3
1 1
end
label define lblx 1 "a" 2 "b" 3 "n.a."
label values x y lblx
list

foreach var of varlist _all {
    loc na ""   // reset local
    loc vallab "" // reset local

    loc vallab:value label `var'
    qui levelsof `var',l(lvs)
    foreach val of local lvs    {
        loc na: label `vallab' `val'
        replace `var'=.a if "`na'"=="n.a." & `var'==`val'
    }

}
list
4
votes

Consider some examples.

Variable is numeric but has labels attached. Observations with a label of "n.a." have been replaced with missings.

clear

input ///
x
1
1
2
3
3
end

label define lblx 1 "a" 2 "b" 3 "n.a."
label values x lblx

list
list, nolabel

foreach var of varlist _all {
    replace `var' = .a if `var' == 3
}

list, nolabel
count if missing(x)

Variable is string type. Strings were replaced with other strings; the ".a"s are not really missings, as far as Stata is concerned. It's just text with no special meaning.

    clear

    input ///
    str5 x
    a
    a
    b
    n.a.
    n.a.
    end

    list

    foreach var of varlist _all {
        replace `var' = ".a" if `var' == "n.a."
    }

    list

The following doesn't work and maybe is your case. You have a numeric variable (again with labels) and you are asking Stata to check for string characters. Thus, there is a type mismatch.

clear

input ///
x
1
1
2
3
3
end

label define lblx 1 "a" 2 "b" 3 "n.a."
label values x lblx

list

foreach var of varlist _all {
    replace `var' = ".a" if `var' == "n.a."
}

As it is, your code is addressing the variables as if they were string type (because of the quotes used in the loop). If you want to use extended missing values (.a), then the variable must be numeric. System missings and extended missings only apply to them. The only missing for string types is a blank ("").

It seems you have numeric variables due to the error you report. If your variables were all string type, then you wouldn't get a type mismatch error. Nevertheless, you mention some "n.a." , which I take to be value labels. Underlying the value label, is some numeric value. You can see them running list, nolabel.

If that is the case, you can replace the numeric values corresponding to value labels equal to "n.a.", with something like:

clear

*----- example data -----

input ///
x y
1 1
1 4
2 4
3 4
3 2
end

label define lblx 1 "a" 2 "b" 3 "n.a."
label values x lblx

label define lbly 1 "a" 2 "b" 4 "n.a."
label values y lbly

list
list, nolabel

*----- what you want -----

foreach var of varlist _all {
    replace `var' = .a if `var' == "n.a.":`:value label `var''
}

// check
list, nolabel
count if missing(x)
count if missing(y)

Notice that for variable x, the value label n.a. is mapped to the value 3, while for variable y, it is mapped to value 4. Thus, for one variable you want to replace values of 3, and for the other, values of 4. The code will take care of that automatically. Additionally, Stata now recognizes the replaced values as missings.

See help labels, help missing and https://stackoverflow.com/a/25942520/2077064, which explains more on how the condition within the replace is working.

If you need to select one specific type of variables, Aspen Chen has already mentioned ds. Other options can be found at

The Stata Journal (2010) 10, Number 2, pp. 281–296, Speaking Stata: Finding variables, by Nick Cox. (Freely available on the web.)