Generate variable if greater than mean, by group

Question

I would like to generate a variable newvar which is equal to 1 for observations which are above the average for two variables (var1 and var2), where this average is not the global dataset average, but the average of the observations within the group to which observation belongs (variable group).

Here is a replicable example:

clear
input str59 group float(var1 var2)
"Algeria"  0 .000033156746
"Algeria"  0  .00017467902
"Algeria"  0  .00024518964
"Algeria"  0    .000624308
"Angola"  0   .0007729884
"Angola"  0   .0014512347
"Angola"  0    .001463664
"Angola"  0   .0015886982
end

dimitriy dimitriy · Accepted Answer · 2018-09-06T19:03:09

Here's one way to do this. Start with an all-zero counter variable, above_grp_means. Loop through the two variables, calculating the group-specific mean, and adding 1 to above_grp_means if the value is above the mean. Then recode above_grp_means to a binary flag.

gen above_grp_means = 0

foreach x of varlist var1 var2 {
    bysort group: egen mean = mean(`x')
    replace above_grp_means = above_grp_means + 1 if `x'>=mean & !missing(`x')
    drop mean
}

replace above_grp_means = cond(above_grp_means==2,1,0)

Stata conceptualizes missing data as a very large positive number, but the second part of the if condition handles that in case you have missing data.

Generate variable if greater than mean, by group

1 Answers