1
votes

Objective: Go from Have table + Help table to Want table. The current implementation (below) is slow. I believe this is a good example of how not to use SAS Macros, but I'm curious as to whether... 1. the macro approach could be salvaged / made fast enough to be viable (e.g. proc append is supposed to speed up the action of stacking datasets, but I was unable to see any performance gains.) 2. what all the alternatives would look like.

I have written a non-macro solution that I will post below for comparison sake.

Data: 
data have ; 
input name $ term $; 
cards;
Joe   2000 
Joe   2002
Joe   2008 
Sally 2001
Sally 2003
; run; 

proc print ; run; 

data help ; 
input terms $ ; 
cards; 
2000
2001
2002
2003
2004
2005
2006
2007
2008
; run; 

proc print ; run; 

data want ; 
input name $ term $ status $; 
cards;
Joe   2000  here
Joe   2001  gone
Joe   2002  here
Joe   2003  gone
Joe   2004  gone
Joe   2005  gone
Joe   2006  gone
Joe   2007  gone
Joe   2008  here
Sally 2001  here
Sally 2002  gone
Sally 2003  here
; run; 

proc print data=have ; run; 

I can write a little macro to get me there for each individual:

%MACRO RET(NAME); 
proc sql ; 
create table studtermlist as 
select distinct term 
from have 
where NAME = "&NAME"
; 
SELECT Max(TERM) INTO :MAXTERM 
FROM HAVE
WHERE NAME = "&NAME"
; 
SELECT MIN(TERM) INTO :MINTERM 
FROM HAVE
WHERE NAME = "&NAME"
; 
CREATE TABLE TERMLIST AS 
SELECT TERMS  
FROM HELP 
WHERE TERMS BETWEEN "&MINTERM." and "&MAXTERM."
ORDER BY TERMS 
;
CREATE TABLE HEREGONE_&Name AS 
SELECT 
A.terms , 
"&Name" as Name,
CASE 
WHEN TERMS EQ TERM THEN  'Here'
when term is null THEN 'Gone'
end as status
from termlist a left join studtermlist b 
 on a.terms eq b.term 
; 
quit; 
%MEND RET ; 


%RET(Joe);
%RET(Sally);

proc print data=HEREGONE_Joe; run; 
proc print data=HEREGONE_Sally; run; 

But it's incomplete. If I loop through for (presumably quite a few names)...

*******need procedure for all names - grab info on have ; 
proc sql noprint; 
select distinct name into :namelist separated by ' '
from have
; quit;

%let n=&sqlobs ; 


%MACRO RETYA ; 
OPTIONS NONOTEs ; 
%do i = 1 %to &n ; 
 %let currentvalue = %scan(&namelist,&i); 
 %put &currentvalue ; 
 %put &i ; 
%RET(&currentvalue);
%IF &i = 1 %then %do ; 
data base; set HEREGONE_&currentvalue; run; 
                 %end; 
%IF &i gt 1 %then %do ; 
proc sql ; create table base as 
select * from base
union 
select * from HEREGONE_&currentvalue
;
drop table HEREGONE_&currentvalue;
quit;
                 %end; 
%end ; 
OPTIONS NOTES; 
%MEND; 

%RETYA ; 

proc sort data=base ; by name terms; run; 
proc print data=base; run; 

So now I have want, but with 6,000 names, it takes over 20 minutes.

4
Are the TERMS in the HELP data set actually numeric, and are they continuous like in the example?DomPazz
What about not using a macro at all?Reeza
Terms are intentionally character. Actual terms can include multiple zeros (201000) and are stored as character.Wes McClintick
Hmm...my first solution assumes the opposite. I'll take another look at it later, but it depends on how your terms are stored.Reeza
Terms are intentionally character. Actual terms can include multiple zeros (201000) and are stored as character. They are continuous...albeit on an unusual scale: 200000, 200010, 200015, 20020, 200050, 200060, ... but this is taken care of by use of the "help" table. Thanks for your replies, i look forward to having a moment to scrutinize them.Wes McClintick

4 Answers

2
votes

Let's try the alternative solution. For each name find the min/max term via a proc SQL data step. Then use a data step to create the time period table and merge that with your original table.

*Sample data;
data have ; 
input name $ term ; 
cards;
Joe   2000 
Joe   2002
Joe   2008 
Sally 2001
Sally 2003
; run; 

*find min/max of each name;
proc sql;
create table terms as
select name, min(term) as term_min, max(term) as term_max
from have
group by name
order by name;
quit;

*Create table with the time periods for each name;
data empty;
set terms;
do term=term_min to term_max;
output;
end;
drop term_min term_max;
run;

*Create final table by merging the original table with table previously generated;
proc sql;
create table want as
select a.name, a.term, case when missing(b.term) then 'Gone'
                        else 'Here' end as status
from empty a
left join have b
on a.name=b.name
and a.term=b.term
order by a.name, a.term;
quit;

EDIT: Now looking at your macro solution, part of the problem is that you're scanning your table too many times.

  • The first table, studenttermlist is not required, the last join can be filtered instead.
  • The two macro variables, min/max term can be calculated in a single pass
  • Avoid the smaller interim term list and use a where clause to filter your results
  • Use Call Execute to call your macro rather than another macro loop
  • Rather than loop through to append the data, take advantage of a naming convention and use a single data step to append all outputs.

    %MACRO RET(NAME); 
    proc sql noprint; 
    
    SELECT MIN(TERM), Max(TERM) INTO :MINTERM,  :MAXTERM
    FROM HAVE
    WHERE NAME = "&NAME"
    ; 
    
    
    CREATE TABLE _HG_&Name AS 
    SELECT 
    A.terms , 
    "&Name" as Name,
    CASE 
    WHEN TERMS EQ TERM THEN  'Here'
    when term is null THEN 'Gone'
    end as status
    from help a 
    left join have b 
     on a.terms eq b.term 
     and b.name="&name"
     where a.terms between "&minterm" and "&maxterm";
    ; 
    quit; 
    %MEND RET ; 
    
    
    *call macro;
    proc sort data=have;
    by name term;
    run;
    
    data _null_;
        set have;
        by name;
        if first.name then do;
        str=catt('%ret(', name, ');');
        call execute(str);
        end;
    run;
    
    
    *append results;
    data all;
        set _hg:;
    run;
    
1
votes

You can actually do this in a single nested SQL query. It would be messy and hard to read.

I'm going to break it out into the three components.

First, get the distinct names;

proc sql noprint;
create table names as
select distinct name from have;
quit;

Second, Cartesian product names and terms to get all the combos.

proc sql noprint;
create table temp as
select a.name, b.terms as term
from names as a,
     help as b;
quit;

Third, left join to find the matches

proc sql noprint;
create table want as
select a.name,
       a.term,
       case
          when missing(b.term) then "gone"
          else "here"
       end as Status
from temp as a
left join
     have as b
on a.name=b.name
and a.term=b.term;
quit;

Last, delete the temp table to save space;

proc datasets lib=work nolist;
delete temp;
run;
quit;

As Reeza shows, there are other ways to do this. As I said above, you can merge all this into a single SQL join and get the results you want. Depending on computer memory and data size, it should be OK (and might be faster as everything is in memory).

1
votes
proc sql;
create table want as
select c.name, c.terms, a.term, 
       ( case when missing(a.term) then "Gone"
           else "Here" end ) as status
from (select distinct a.name, b.terms
      from have a, help b) c
left join have a
on c.terms = a.term and c.name = a.name
order by c.name, c.terms, a.term
;
0
votes

I'm going to throw in my similar answer so I can compare them all later.

proc sql ; 
create table studtermlist as 
select distinct term,name 
from have 
; 
create table MAXMINTERM as 
SELECT Max(TERM) as MAXTERM, Min(TERM) as MINTERM, name  
FROM HAVE
GROUP BY name
; 
CREATE TABLE TERMLIST AS 
SELECT TERMS,name  
FROM HELP a,MAXMINTERM b 
WHERE TERMS BETWEEN MINTERM and MAXTERM
ORDER BY name,TERMS 
;
CREATE TABLE HEREGONE AS 
SELECT 
a.terms , 
a.Name  ,
CASE 
WHEN TERMS EQ TERM THEN  'Here'
when term is null THEN 'Gone'
end as status
from termlist a left join studtermlist b 
 on a.terms eq b.term
 and a.name eq b.name 
order by name, terms
; 
quit;