0
votes

(first time posting)
I have a data set where I need to create a new variable (in SAS), based on meeting a condition related to another variable. So, the data contains three variables from a survey: Site, IDnumb (person), and Date. There can be multiple responses from different people but at the same site (see person 1 and 3 from site A).

Site  IDnumb    Date 
    a     1      6/12
    b     2       3/4
    c     4       5/1
    a     3       .  
    d     5       .  

I want to create a new variable called Complete, but it can't contain duplicates. So, when I go to proc freq, I want site A to be counted once, using the 6/12 Date of the Completed Survey. So basically, if a site is represented twice and contains a Date in one, I want to only count that one and ignore the duplicate site without a date.

                N    %

Complete        3   75%
Last Month      1   25%

My question may be around the NODUP and NODUPKEY possibilities. If I do a Proc Sort (nodupkey) by Site and Date, would that eliminate obs "a 3 ."?

Any help would be greatly appreciated. Sorry for the jumbled "table", as this is my first post (hints on making that better are also welcomed).

1
Are you deciding how to record this data or do you have data you need to work with?Michael Richardson
To post code or a table like that, enter it in without the BRs, highlight it, and then hit control-K.Joe
Can you post the desired output table? You say you want a new variable, do you really want that? What would it contain?Joe
I have data and I am looking to create some descriptive statistics. For example, the frequency of sites that have completed the survey. In the above example, I would like to create a table that shows each site once, and then whether or not they completed the survey (if Date is not missing). So, 3 of the 4 sites (75%) have completed the survey, and 1 of the 4 (25%) sites has completed in the current month.Jebediah15
@Joe Thanks for the formatting tip!Jebediah15

1 Answers

2
votes

You can do this a number of ways.

First off, you need a complete/not complete binary variable. If you're in the datastep anyway, might as well just do it all there.

proc sort data=yourdata;
by site date descending;
run;

data yourdata_want;
set yourdata;
by site date descending;
if first.site then do;
 comp = ifn(date>0,1,0);
 output;
end;
run;

proc freq data=yourdata_want;
tables comp;
run;

If you used NODUPKEY, you'd first sort it by SITE DATE DESCENDING, then by SITE with NODUPKEY. That way the latest date is up top. You also could format COMP to have the text labels you list rather than just 1/0.

You can also do it with a format on DATE, so you can skip the data step (still need the sort/sort nodupkey). Format all nonmissing values of DATE to "Complete" and missing value of date to "Last Month", then include the missing option in your proc freq.

Finally, you could do the table in SQL (though getting two rows like that is a bit harder, you have to UNION two queries together).