1
votes

Using the following code

   data mydata5;
input default$ numofkids$ count;
datalines;
good nochildren 1500
good kids1to2   2200
good kids3plus  300
bad nochildren  500
bad kids1to2    300
bad kids3plus   200
;
run;

I created a dataset

Obs default  numofkids count 
1   good      nochildr 1500 
2   good      kids1to2 2200 
3   good      kids3plu 300 
4   bad       nochildr 500 
5   bad       kids1to2 300 
6   bad       kids3plu 200 

What I've been trying to get to is something like this

           nochildren   other
good             1500   2500
bad              500    500

I've tried many a things but nothing has worked so far. I know there is any easy way out without getting into complicated codes.

I want to run a datastep where I can set mydata5 and create a dataset which will format like the way i want it with minimal coding required.

Could someone please offer some insights on this.

The purpose is then to run a proc freq to get a chisq test done.

I managed to make some progress with the code but my code does not produce the table like I want it. However, I am able to do a chisq test nonethless

    data mydata6;
set mydata5;
if numofkids='nochildren' then Group=1;
else Group=2;
run;

proc freq data=mydata6;
weight count;
tables default*Group/chisq;
run;

data mydata61;
set mydata5;
if numofkids='kids3plu' then Group=1;
else Group=2;
run;

proc freq data=mydata61;
weight count;
tables default*Group/chisq;
run;

Also, another thing I faced an issue with was when I tried to group the data, I had to specify numofkids=kids3plu instead of the whole string kids3plus. The data did not group if i specified the whole string. Can someone comment on this as well, please?

3

3 Answers

2
votes

I would use PROC SUMMARY/MEANS to do the sum and then transpose to create the format you are looking for.

I'm creating new data sets along the way that should help you see how this works.

data mydata5;
length default $4. numofkids $32.;
input default$ numofkids$ count;

datalines;
good nochildren 1500
good kids1to2   2200
good kids3plus  300
bad nochildren  500
bad kids1to2    300
bad kids3plus   200
;
run;

/*Populate a variable for "nochildren" and "other"*/
data mydata6;
set mydata5;
length kids $32.;
if numofkids = "nochildren"
    then kids=numofkids;
else
    kids = "other";
run;

proc sort data=mydata6;
by default kids;
run;

proc summary data=mydata6;
by default kids;
var count;
output out=mydata7 sum=;
run;

proc transpose data=mydata7 out=mydata8(drop=_name_);
by default;
id kids;
var count;
run;

Produces this: enter image description here

1
votes

Modify the first DATA step you have as follows in order to fix the concatenation to 8 characters problem:

data mydata5;
  length default numofkids $ 25;
  input default $ numofkids $ count;
  datalines;

Now, run PROC SORT followed by a DATA step to create your PROC FREQ-friendly variables. You'll need to use the by, last, and retain statements for SAS to remember previous rows in order to sum up columns to collapse them.

proc sort data=mydata5; by default; run;

data mydata6; set mydata5;
  by default;
  if numofkids="nochildren" then output;
  if numofkids="kids1to2" then hold1=count;
  if numofkids="kids3plus" then hold2=count;
  if last.default then do;
    numofkids="other";
    count=hold1+hold2;
    output;
  end;
  retain hold1 hold2;
run;

Now you can run your PROC FREQ.

0
votes
 data mydata5;
input default$ numofkids$ count;
datalines;
good nochildren 1500
good kids1to2   2200
good kids3plus  300
bad nochildren  500
bad kids1to2    300
bad kids3plus   200
;
run;

proc sort data=mydata5;
by default numofkids;
run; 

data edf;
set mydata5(rename=(count=noofchildren));
by default;
if first.default then count1=0;
if numofkids= 'nochildren' then output;
else count1+noofchildren;
other=count1;
if last.default then output;
keep default noofchildren other ;
run;


        Output will be like this:


                default   nochildren   other
                good           1500   2500
                bad            500    500