Sum across multiple columns by row (in Stata)

Question

I'm trying to sum the values across multiple columns (variables) for each row and store the summed values as a new column. However, my data (a Stata file) has more than 500 variables with each columns named by some abbreviations without any identifiable prefix pattern (also, the first couple variables are names and IDs), so it is not feasible to use varlist inside a rowtotal() function nor using the wildcard method rowtotal(prefix*).

I'm wondering if there's a way to subset Stata data by the range of columns and apply rowtotal() over tjose columns in a way like R (e.g., df[, 3:500]) since I know the range of columns that I want to sum over. Something like this

year    state   aalco   aata    acdt    acpeu
2000    usa 0   0   -1  0
2001    usa 0   0   -1  0
2002    usa 0   0   -1  0
2003    usa 0   0   -1  0
2004    usa 0   0   -1  0
2005    usa 0   0   -1  0
2006    usa 0   0   -1  0
2007    usa 0   0   -1  0
2008    usa 0   0   -1  0
2009    usa 0   0   -1  0
2010    usa 0   0   -1  0
2011    usa 0   0   -1  0
2012    usa 0   0   -1  0

I attached the link of my data here and hope that someone could give me some hints about this https://www.dropbox.com/s/fy5zpmf2tdlf3wx/dyadic_format3.dta?dl=1

I've references these posts (here, and here) but they don't quite solve my puzzle.

Let's flag here a general point implied by @Pearly Spencer's answer. It is often easy to specify a varlist as a range of variable names or even by excluding what you don't want. — Nick Cox
Yes, indeed, I was actually trying to use ds to remove those string variables and convert the "-1" values of the rest of the numeric variables to "0", getting a bit confused with varlist as it still include the first couple string variables, therefore causing type mismatch error. — Chris T.
I am familiar with ds as its notional author. You can combine specifications such as numeric variables that aren't named identifiers. findname is my second go with more functionality and I think better syntax, but I would say that. — Nick Cox

Unknown Unknown · Accepted Answer · 2020-07-28T16:03:07

Using a slightly altered example for better illustration, the following works for me:

clear

input year  str3 state   aalco   aata    acdt    acpeu
2000    usa 0   3   -1  0
2001    usa 0   0   -1  0
2002    usa 0   0   -1  0
2003    usa 0   0   -1  0
2004    usa 0   7   -1  0
2005    usa 0   0   -1  0
2006    usa 0   0   -1  0
2007    usa 0   0   -1  0
2008    usa 0   0   -1  0
2009    usa 0   0   -1  0
2010    usa 0   0   -1  1
2011    usa 0   0   -1  9
2012    usa 0   0   -1  0
end

ds year state, not
putmata data = (`r(varlist)'), view replace
mata: st_store((1::13), st_addvar("double","foo"), rowsum(data[.,2..4]))

Results:

list

     +--------------------------------------------------+
     | year   state   aalco   aata   acdt   acpeu   foo |
     |--------------------------------------------------|
  1. | 2000     usa       0      3     -1       0     2 |
  2. | 2001     usa       0      0     -1       0    -1 |
  3. | 2002     usa       0      0     -1       0    -1 |
  4. | 2003     usa       0      0     -1       0    -1 |
  5. | 2004     usa       0      7     -1       0     6 |
     |--------------------------------------------------|
  6. | 2005     usa       0      0     -1       0    -1 |
  7. | 2006     usa       0      0     -1       0    -1 |
  8. | 2007     usa       0      0     -1       0    -1 |
  9. | 2008     usa       0      0     -1       0    -1 |
 10. | 2009     usa       0      0     -1       0    -1 |
     |--------------------------------------------------|
 11. | 2010     usa       0      0     -1       1     0 |
 12. | 2011     usa       0      0     -1       9     8 |
 13. | 2012     usa       0      0     -1       0    -1 |
     +--------------------------------------------------+

Sum across multiple columns by row (in Stata)

2 Answers