Stata: Using egen, anycount() when values vary for each observation

Question

Each observation in my data presents a player who follows some random pattern. Variables move1 up represent on which moves each player was active. I need to count the number of times each player was active:

The data look as follows (with _count representing a variable that I would like to generate). The number of moves can also be different depending on simulation.

+------------+------------+-------+-------+-------+-------+-------+-------+--------+
| simulation | playerlist | move1 | move2 | move3 | move4 | move5 | move6 | _count |
+------------+------------+-------+-------+-------+-------+-------+-------+--------+
|          1 |          1 |     1 |     1 |     1 |     2 | .     | .     |      3 |
|          1 |          2 |     2 |     2 |     4 |     4 | .     | .     |      2 |
|          2 |          3 |     1 |     2 |     3 |     3 | 3     | 3     |      4 |
|          2 |          4 |     4 |     1 |     2 |     3 | 3     | 3     |      1 |
+------------+------------+-------+-------+-------+-------+-------+-------+--------+

egen combined with anycount() is not applicable in this case because the argument for the value() option is not a constant integer.

I have made an attempt to cycle through each observation and use egen rowwise (see below) but it keeps count as missing (as initialised) and is not very efficient (I have 50,000 observations). Is there a way to do this in Stata?

gen _count =.  
quietly forval i = 1/`=_N' {  
    egen temp = anycount(move*), values( `=`playerlist'[`i']')
    replace _count = temp
    drop temp
}

Nick Cox Nick Cox · Accepted Answer · 2013-08-31T07:31:39

You can easily cut out the loop over observations. In addition, egen is only to be used for convenience, never speed.

gen _count = 0 
quietly forval j = 1/6 {  
    replace _count = _count + (move`j' == playerlist)
}

or

gen _count = move1 == playerlist 
quietly forval j = 2/6 {  
    replace _count = _count + (move`j' == playerlist)
}

Even if you had been determined to use egen, the loop need only be over the distinct values of playerlist, not all the observations. Say the maximum is 42

gen _count = 0 
quietly forval k = 1/42 { 
    egen temp = anycount(move*), value(`k') 
    replace _count = _count + temp  
    drop temp 
}

But that's still a lousy method for your problem. (I wrote the original of anycount() so I can say why it was written.)

See also http://www.stata-journal.com/sjpdf.html?articlenum=pr0046 for a review of working rowwise.

P.S. Your code contains bugs.

You replace your count variable in all observations by the last value calculated for the count in the last observation.

Values are compared with a local macro playerlist. You presumably have no local macro of that name, so the macro is evaluated as empty. The result is that you end by comparing each value of your move* variables with the observation numbers. You meant to use the variable name playerlist, but the single quotation marks force the macro interpretation.

For the record, this fixes both bugs:

  gen _count = .  
  quietly forval i = 1/`=_N' {  
      egen temp = anycount(move*), values(`= playerlist[`i']')
      replace _count = temp in `i' 
      drop temp
  }

Stata: Using egen, anycount() when values vary for each observation

1 Answers