1
votes

I have a dataset where each person (row) has values 0, 1 or . in a number of variables (columns).

I would like to create two variables. One that includes the count of all the 0 and one that has the count of all the 1 for each person (row). In my case, there is no pattern in the variable names. For this reason I create a varlist of all the existing variables excluding the ones that need not to be counted.

 +--------+--------+------+------+------+------+------+----------+--------+
 |   ID   | region |  Qa  |  Qb  |  C3  |  C4  |  Wa  |  count 0 | count 1|
 +--------+--------+------+------+------+------+------+----------+--------+
 |      1 |      A |    1 |    1 |    1 |    1 |    . |        0 |      4 |
 |      2 |      B |    0 |    0 |    0 |    1 |    1 |        3 |      2 |
 |      3 |      C |    0 |    0 |    . |    0 |    0 |        4 |      0 |
 |      4 |      D |    1 |    1 |    1 |    1 |    0 |        0 |      4 |
 +--------+--------+------+------+------+------+------+----------+--------+

The following works, however, I cannot add an if statement

ds ID region, not // all variables in the dataset apart from ID region
return list  
local varlist = r(varlist)
egen count_of_1s = rowtotal(`varlist') 

If I change the last line with the one below, I get an error of invalid syntax.

egen count_of_1s = rowtotal(`varlist') if `v' == 1 

I turned from count to summing because I thought this is a sneaky way out of the problem. I could change the values from 0,1 to 1, 2, then sum all the two values separately in two different variables and then divide accordingly in order to get the actual count of 1 or 2 per row.

I found this Stata: Using egen, anycount() when values vary for each observation however Stata freezes as my dataset is quite large (100.000 rows and 3000 columns).

Any help will be very appreciated :-)

Solution based on the response of William

* number of total valid responses (0s and 1s, excluding . )
ds ID region, not // all variables in the dataset apart from ID region
return list  
local varlist = r(varlist)
egen count_of_nonmiss = rownonmiss(`varlist') // this counts all the 0s and 1s (namely, the non missing values)

* total numbers of 1s per row
ds ID region count_of_nonmiss, not // CAUTION: count_of_nonmiss needs not to be taken into account for this!
return list  
local varlist = r(varlist)    
generate count_of_1s = rowtotal(`varlist')
2

2 Answers

4
votes

How about

egen count_of_nonmiss = rownonmiss(`varlist')
generate count_of_0s = count_of_nonmiss - count_of_1s

When the value of the macro varlist is substituted into your if clause, the command expands to

egen count_of_1s = rowtotal(`varlist') if Qa Qb C3 C4 Wa == 1

Clearly a syntax error.

0
votes

I had the same problem to count the occurrences of specifies values in each observation across a set of variables.

I could resolve that problem in the following ways: If you want to count the occurrences of 0 in the values across x1-x2, so

clear
input id x1 x2 x3

        id         x1         x2         x3
1. 1  1 0 2
2. 2  2 0 2
3. 3  2 0 3
4. end
egen count2 = anycount(x1-x3), value(0)