I have the following two tables. One has a couple of bet results and the other has a number of 'dummy' bets the need to be added. I want to get mean of the original sample, the mean of the sample with the dummy bets added and then perform both a Chi Squared test of the difference between the columns and a Kruskal Wallis test on the difference between the rows.
I'm having an issue with tabulating the data to product the mean for both categories.
data A;
input username $ betdate : datetime. stake winnings node $;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90 X
player1 04NOV2008:09:03:44 100 40 L
player2 07NOV2008:14:03:33 120 -120 W
player1 05NOV2008:09:00:00 50 15 L
player1 05NOV2008:09:05:00 30 5 W
player1 05NOV2008:09:00:05 20 10 L
player2 09NOV2008:10:05:10 10 -10 W
player2 15NOV2008:15:05:33 35 -35 W
player1 15NOV2008:15:05:33 35 15 L
player1 15NOV2008:15:05:33 35 15 L
run;
proc sql; create table B(toAdd num,node char(100)); quit;
proc sql; insert into B (toAdd, node)
values(5, 'X')
values(3, 'L')
values(7, 'W') ;
quit;
I want to show the mean without dummy bets and the mean with the dummy bets included. I'm added the dummy bets as follows:
proc sort data=A out=A; by node; run;
data A;
modify A B;
by node;
do i = 1 to toAdd;
stake = 0;
stakediff = -1;
dummy = 1;
output;
end;
run;
The problem is when I tabulate the data, because there isn't really two distinct categories, it's not showing me what I want.
proc tabulate data=A;
class node dummy;
var stake winnings;
table node="",stake="" * (Mean="")*(dummy="" ALL);
run;
I'm using the dummy bets to create a mean that's based on a large 'N'. I would just do this in PROC Report and calculate the mean manually with the larger 'N' as a numerator, but I need to perform a Kruskal Wallis and Chi-Squared test. It's easier to have the dummy bets with a stake of zero to keep things simple and maintain the correct counts in each category. Moreover, it's non-trivial to calculate the standard error on-the-fly (or back it out of the result created by PROC TABULATE) without having the dummy bets in each category.
How can I just show the result of PROC TABULATE above, but without the 0, 1 and ALL categories as the entries when the dummy is 1 are meaningless? Ideally, I'd like to see 'WITHOUT DUMMIES' as 0 and 'WITH DUMMIES' as 1 and display the result of the ALL column as the 'WTIH DUMMIES' = 1 category. I can then proceed to performin the KRUSKAL WALLIS on the 'NODE' class variable and the CHI-SQUARED on the dummy class variable because as it stands, I can't perform these tests with only the 0 category and the 1 category as classes in the tests.
If I could copy all the rows that are in category dummy = 0 into the category dummy = 1 it would solve the problem, I think.