0
votes

I have the following two tables. One has a couple of bet results and the other has a number of 'dummy' bets the need to be added. I want to get mean of the original sample, the mean of the sample with the dummy bets added and then perform both a Chi Squared test of the difference between the columns and a Kruskal Wallis test on the difference between the rows.

I'm having an issue with tabulating the data to product the mean for both categories.

   data A;
   input username $  betdate : datetime. stake winnings node $;
   dateOnly = datepart(betdate) ;
   format betdate DATETIME.;
   format dateOnly ddmmyy8.;
   datalines; 
    player1 12NOV2008:12:04:01 90 -90 X
    player1 04NOV2008:09:03:44 100 40  L
    player2 07NOV2008:14:03:33 120 -120 W
    player1 05NOV2008:09:00:00 50 15 L
    player1 05NOV2008:09:05:00 30 5 W
    player1 05NOV2008:09:00:05 20 10 L
    player2 09NOV2008:10:05:10 10 -10 W
    player2 15NOV2008:15:05:33 35 -35 W
    player1 15NOV2008:15:05:33 35 15 L
    player1 15NOV2008:15:05:33 35 15 L
run;

proc sql; create table B(toAdd num,node char(100)); quit;
proc sql;  insert into B (toAdd, node)                                                                                         
    values(5, 'X')
    values(3, 'L')
    values(7, 'W') ;                                                                                                                                                                                            
quit; 

I want to show the mean without dummy bets and the mean with the dummy bets included. I'm added the dummy bets as follows:

proc sort data=A out=A; by node; run;
data A;
    modify A B;
    by node;
    do i = 1 to toAdd;
    stake = 0;
    stakediff = -1;
    dummy = 1;
    output;
end;
run;

The problem is when I tabulate the data, because there isn't really two distinct categories, it's not showing me what I want.

proc tabulate data=A;
class node dummy;
    var stake winnings;
    table node="",stake="" * (Mean="")*(dummy="" ALL);
run;

I'm using the dummy bets to create a mean that's based on a large 'N'. I would just do this in PROC Report and calculate the mean manually with the larger 'N' as a numerator, but I need to perform a Kruskal Wallis and Chi-Squared test. It's easier to have the dummy bets with a stake of zero to keep things simple and maintain the correct counts in each category. Moreover, it's non-trivial to calculate the standard error on-the-fly (or back it out of the result created by PROC TABULATE) without having the dummy bets in each category.

How can I just show the result of PROC TABULATE above, but without the 0, 1 and ALL categories as the entries when the dummy is 1 are meaningless? Ideally, I'd like to see 'WITHOUT DUMMIES' as 0 and 'WITH DUMMIES' as 1 and display the result of the ALL column as the 'WTIH DUMMIES' = 1 category. I can then proceed to performin the KRUSKAL WALLIS on the 'NODE' class variable and the CHI-SQUARED on the dummy class variable because as it stands, I can't perform these tests with only the 0 category and the 1 category as classes in the tests.

If I could copy all the rows that are in category dummy = 0 into the category dummy = 1 it would solve the problem, I think.

1

1 Answers

1
votes

Your 'if I could' is the right idea, largely. You need to fix your data to reflect the groupings you want; dummy=0 should be only nondummy bets, dummy=1 should be dummy AND nondummy bets, if I understand correctly. So you need to output the dummy=0 rows twice, once with dummy=1 and once with dummy=0.

Something like:

data A;
    modify A B;
    by node;
    output;
    dummy=1;
    output;
    do i = 1 to toAdd;
    stake = 0;
    stakediff = -1;
    dummy = 1;
    output;
end;
run;