1
votes

I have the information shown in the first four columns in the table below and would like to add the column Calls_previous_3_days containing the sum of calls from each CustID to each Area the previous three dates.

I.e., if a custumer made a call to Support on 17Jan2015 I would like the new variable to show the sum of the number of calls that the customer made to Support during the period 14Jan2015-16Jan2015.

How do I calculate the sum in the column Calls_previous_3_days dependent on the CustID, Area and Date?

CustID  Area     Date       Calls   Calls_previous_3_days
3137    Support  05Jan2015  1       0
3137    Support  14Jan2015  4       0 
3137    Support  16Jan2015  1       4
3137    Support  17Jan2015  1       5
3137    Support  20Jan2015  2       1
3137    Support  22Jan2015  1       2
5225    Support  26Jan2015  1       0
5225    Support  27Jan2015  1       1
5225    Support  28Jan2015  1       2
5225    Sales    14Feb2015  1       0       
5225    Sales    15Feb2015  1       1
5225    Sales    22Feb2015  1       0
2
I'd suggest looking up the lag function - that should be a good starting point. Linkuser667489

2 Answers

0
votes

You can achieve this with arrays, by storing the last three values and dates and then summing where the dates satisfy your criteria.

proc sort data = have;
    by CustID Area;;
run;
proc print;
data want;
    set have;
    by CustID Area;
    /* Create temporary array (which automatically retains) */
    array hist{3,2} _temporary_;
    /* Initialise the array values if starting a new group */
    if first.Area then call missing(of hist[*]);
    /* Sum the values in the array that satisfy your condition */
    callsp3 = sum(
        0,
        (sum(Date, - hist[1,2]) <= 3) * hist[1,1],
        (sum(Date, - hist[2,2]) <= 3) * hist[2,1],
        (sum(Date, - hist[3,2]) <= 3) * hist[3,1]
    );
    /* Store the current value/date in the array and shift old values down */
    hist[1,1] = hist[2,1];
    hist[1,2] = hist[2,2];
    hist[2,1] = hist[3,1];
    hist[2,2] = hist[3,2];
    hist[3,1] = Calls;
    hist[3,2] = Date;
run;

There are (as always with SAS) several ways to approach this problem. You could also investigate the lag() functions or use proc sql to self join the data with an on clause like to specify your condition. I prefer the array approach as lag() has some gotchas and self joining will be slower. However, if it is likely that you will want different or longer windows then the array approach can become somewhat unwieldy due to the length of the code. This can be mitigated if you are confident in the macro language, but it may then be better to investigate a different approach.


If you have SAS/ETS licensed you should be able to achieve this with proc expand. I cannot confirm this does exactly what you want though as I don't have a licence.

proc expand data = have out = want from = day to = day;
    by CustID Area;
    id Date;
    convert calls = callsp3 / method = none transformout = (movsum 4);
run;
data want; 
    set want (where = (calls ~= .));
    callsp3 = callsp3 - calls;
run;
0
votes

The LAG function, as @SRSwift detailed, certainly works well.

Also, here is a SQL solution. Used a Data Step to substitute 0 for missing, not sure if a CASE statement in the SQL step would be the best way to do this or not.

Data HAVE;
Input custid $ area $ date:date9. calls;
format date date9.;
datalines;
3137    Support  05Jan2015  1
3137    Support  14Jan2015  4
3137    Support  16Jan2015  1
3137    Support  17Jan2015  1
3137    Support  20Jan2015  2
3137    Support  22Jan2015  1
5225    Support  26Jan2015  1
5225    Support  27Jan2015  1
5225    Support  28Jan2015  1
5225    Sales    14Feb2015  1
5225    Sales    15Feb2015  1
5225    Sales    22Feb2015  1
;
Run;

Proc sql;
Create table WANT as
Select custid,area,date,calls,
(select sum(calls) from have b where a.custid=b.custid and 
  a.area=a.area and (a.date-3<=b.date<a.date)) as Calls_Previous_3_Days
From HAVE a;
Quit;

Data WANT;
Set WANT;
If missing(calls_previous_3_days) then Calls_Previous_3_Days=0;
Run;