0
votes

I have a tricky question about how to manipulate some data. Suppose I have the following structure of data:

_n   group   attr     value
1      1     height     3
2      1     weight     12
3      1     length     9  
4      2     weight     15
5      3     height     4

I want to have all groups have height, weight, and length. If there is initially not a value, I want to have a missing value be put in. Thus the end result would look like this:

_n   group   attr     value
1      1     height     3
2      1     weight     12
3      1     length     9 
4      2     height     .
5      2     weight     15
6      2     length     .  
7      3     height     4
8      3     weight     .
9      3     length     .

I don't know how to do this, but perhaps it would involve reshape?

Another thing I thought about would be to use egen to sum by group. We could figure out that group 1 has 3 members, group 2 has 1 member, and group 3 has 1 member. Then we could perform functions on groups 2 and 3 to get them up to par. But this could get complicated.

1
Reminder: Good questions here usually show attempts at code, not just wild guesses at what might be relevant. Your question is clear and specific, so it gets through the frontier.Nick Cox
Thanks for the reminder.bill999

1 Answers

2
votes

This is (1) very easy and (2) usually the wrong way to go.

(1) fillin is dedicated to this task. Your example leaves ambiguous whether attr is a numeric variable with value labels or a string variable, but this works either way:

. clear 

. input   group  str6 attr     value

          group       attr      value
1. 1     "height"     3
2. 1     "weight"     12
3. 1     "length"     9  
4. 2     "weight"     15
5. 3     "height"     4
6. end 

. fillin group attr 

. list, sepby(group) 

      +----------------------------------+
      | group     attr   value   _fillin |
      |----------------------------------|
   1. |     1   height       3         0 |
   2. |     1   length       9         0 |
   3. |     1   weight      12         0 |
      |----------------------------------|
   4. |     2   height       .         1 |
   5. |     2   length       .         1 |
   6. |     2   weight      15         0 |
      |----------------------------------|
   7. |     3   height       4         0 |  
   8. |     3   length       .         1 |  
   9. |     3   weight       .         1 |
      +----------------------------------+

(2) However, what use is this structure? How are you going to relate height, length and weight? Usually, you would be better off with this (assuming same data as sandbox):

  reshape wide value, i(group) j(attr) string 
  renpfix value 
  list, sepby(group) 

Here we end with variables group, height, length and weight. If you still want the long structure, that is now achievable with reshape long.

Notes:

  1. For more on fillin, see the help, the manual entry and this expository note.

  2. renpfix in this case zaps prefixes of variable names. The tacit third argument is an empty string, so the prefix value is replaced with an empty string, namely removed. In recent versions of Stata (12 up), that is now easy with rename.

  3. Presumably these data are just a toy example, but I'd be amazed if the wide structure were not more useful for your real data too. If there's a reason for the long structure, you did not tell us about it. (If you really have panel data, that's a different story, but check out tsfill.)