0
votes

I want to mimic the behavior of Stata's tabulate , generate() command in R. It is illustrated below; the command's functionality is twofold. First, in my example, it produces a one-way table of frequency counts. Second, it generated dummy variables for each of the values contained on the variable (var1) using the prefix (stubname) declared in option ,generate() to name the generated dummy variables (d_1 - d_7). My question is regarding the second functionality. R-base solutions are preferred, but packaged dependent are also welcome.

[Edit]: My final goal is to generate a data.frame() that emulates the last data set printed on the screen.

clear all
input var1 
0
1
2
2
2
2
42
42
777
888
999999
end
tabulate var1 ,gen(d_)

/*     var1 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          1        9.09        9.09
          1 |          1        9.09       18.18
          2 |          4       36.36       54.55
         42 |          2       18.18       72.73
        777 |          1        9.09       81.82
        888 |          1        9.09       90.91
     999999 |          1        9.09      100.00
------------+-----------------------------------
      Total |         11      100.00          */


list, sep(11)



/*   +--------------------------------------------------+
     |   var1   d_1   d_2   d_3   d_4   d_5   d_6   d_7 |
     |--------------------------------------------------|
  1. |      0     1     0     0     0     0     0     0 |
  2. |      1     0     1     0     0     0     0     0 |
  3. |      2     0     0     1     0     0     0     0 |
  4. |      2     0     0     1     0     0     0     0 |
  5. |      2     0     0     1     0     0     0     0 |
  6. |      2     0     0     1     0     0     0     0 |
  7. |     42     0     0     0     1     0     0     0 |
  8. |     42     0     0     0     1     0     0     0 |
  9. |    777     0     0     0     0     1     0     0 |
 10. |    888     0     0     0     0     0     1     0 |
 11. | 999999     0     0     0     0     0     0     1 |
     +--------------------------------------------------+ */
2
model.matrix(~0+x, data.frame(x = factor(1:5))) would create the second table.Eyayaw
This would give you a little intro to R from Stata perspective: rslblissett.com/wp-content/uploads/2016/09/RTutorial_160930.pdfEyayaw
Thanks for the reference @EyayawB.! I'll check that. However, I updated the question to be more precise, and I suspect the solution you provided is not suitable anymore.Álvaro A. Gutiérrez-Vargas

2 Answers

1
votes

I guess you are assuming each value in var_1 is unique so that you get dummy variables rather than counts in the d_ fields.

You could try something like this:

var1 <- 1:5
dummy_matrix <- vapply(var1, function(x) as.numeric(var1 == x), rep(1, 5)) # create a matrix of dummy vars
colnames(dummy_matrix) <- paste0("d_", var1) # name the columns
cbind(var1, dummy_matrix) # bind to var1

Output:

  var1 d_1 d_2 d_3 d_4 d_5
1    1   1   0   0   0   0
2    2   0   1   0   0   0
3    3   0   0   1   0   0
4    4   0   0   0   1   0
5    5   0   0   0   0   1
1
votes
set.seed(123)
df = data.frame(var1 = factor(sample(10, 20, TRUE)))

df = data.frame(df, model.matrix(~0+var1, df)) # 0 here is to suppress the intercept. The smallest value will be the base group--and hence will be dropped. 
names(df)[-1] = paste0('d_', 1:(ncol(df)-1))
df
    var1 d_1 d_2 d_3 d_4 d_5 d_6 d_7 d_8 d_9
1     3   0   1   0   0   0   0   0   0   0
2     3   0   1   0   0   0   0   0   0   0
3    10   0   0   0   0   0   0   0   0   1
4     2   1   0   0   0   0   0   0   0   0
5     6   0   0   0   0   1   0   0   0   0
6     5   0   0   0   1   0   0   0   0   0
7     4   0   0   1   0   0   0   0   0   0
8     6   0   0   0   0   1   0   0   0   0
9     9   0   0   0   0   0   0   0   1   0
10   10   0   0   0   0   0   0   0   0   1
11    5   0   0   0   1   0   0   0   0   0
12    3   0   1   0   0   0   0   0   0   0
13    9   0   0   0   0   0   0   0   1   0
14    9   0   0   0   0   0   0   0   1   0
15    9   0   0   0   0   0   0   0   1   0
16    3   0   1   0   0   0   0   0   0   0
17    8   0   0   0   0   0   0   1   0   0
18   10   0   0   0   0   0   0   0   0   1
19    7   0   0   0   0   0   1   0   0   0
20   10   0   0   0   0   0   0   0   0   1