0
votes

I am trying to find the minimum value within a group at a point in time in SAS. For example, I have the following dataset

ID NO| DATE      |RESULT  

1    |28NOV2015   |5.6  
1    |25JAN2016   |4.8  
1    |8MAR2016    |5.2  
1    |14MAY2016   |5.0  
1    |20JUL2016   |4.3    
1    |18SEP2016   |3.8  
2    |20AUG2012   |8.6  
2    |14NOV2012   |7.2  
2    |03JAN2013   |5.5  
2    |12MAR2013   |5.3  

I would like to programatically create a new column that will return the minimum result within the ID number group for values that occurred prior to the current value. The resulting output should look as follows:

ID NO| DATE       |RESULT |PMINVAL  
-----|------------|-------|--------  
1    |28NOV2015   |5.6    |5.6  
1    |25JAN2016   |4.8    |5.6  
1    |8MAR2016    |5.2    |4.8  
1    |14MAY2016   |5.0    |4.8  
1    |20JUL2016   |4.3    |4.8  
1    |18SEP2016   |3.8    |4.3  
2    |20AUG2012   |8.6    |8.6  
2    |14NOV2012   |7.2    |8.6  
2    |03JAN2013   |5.5    |7.2  
2    |12MAR2013   |5.3    |5.5  

I tried to accomplish this by using a lag function in combination with some conditional statements. I do not think this approach will work since the lag function when used with conditionals evaluates to the last record where the condition was met, which is what is desired in this case. I first created an additional column that defined the "baseline" or first record in each ID group to use as a reference for the first two PMINVAL records since these will always be the baseline result value.

data MDX2;  
set MDX1;  
count + 1;  
by ID_NO;  
if first.ID_NO then count=1;  
run;  
data MDX3;  
set MDX2;  
if count <=2 then PMINVAL=BLRESULT;  
if count >2 and lag(PMINVAL)<=lag(RESULT) then PMINVAL=lag(PMINVAL);  
if count >2 and lag(PMINVAL)>lag(RESULT) then PMINVAL=lag(RESULT);  
run;
2
Please post code that you have used to try and do this.Stu Sztukowski
Why is the second value 5.6 instead of 4.8?Tom
@Tom the second value is 5.6 because I am looking for the minimum value that has occurred prior to the result date. Since 4.8 is the result on the current date, the comparison value must come from a previous date.AUN
Ok. Then no need to count or use LAG(). Just output before calculating new minimum.Tom

2 Answers

2
votes

Just use the MIN() function and retain the value. Output the value BEFORE calculating the new minimum. This will work if you have at most one result per day.

data have ;
  input id date result;
  informat date date. ;
  format date date9.;
cards;
1 28NOV2015 5.6
1 25JAN2016 4.8
1  8MAR2016 5.2
1 14MAY2016 5.0
1 20JUL2016 4.3
1 18SEP2016 3.8
2 20AUG2012 8.6
2 14NOV2012 7.2
2 03JAN2013 5.5
2 12MAR2013 5.3
;

data want;
  set have ;
  by id date;
  if first.id then pminval=result;
  retain pminval;
  output;
  pminval=min(pminval,result);
run;

If you have multiple results per day then you might need to add another variable and some extra logic.

data want;
  do until (last.date);
    set have ;
    by id date;
    if first.id then pminval=result;
    daymin=min(daymin,result);
    output;
  end;
  retain pminval;
  pminval=min(pminval,daymin);
  drop daymin;
run;
1
votes

Using RETAIN and a DATA step:

DATA have;
INPUT ID_NO DATE :DATE9. RESULT;
FORMAT DATE DATE9.;
DATALINES;
1 28NOV2015 5.6
1 25JAN2016 4.8
1 8MAR2016 5.2
1 14MAY2016 5.0
1 20JUL2016 4.3
1 18SEP2016 3.8
2 20AUG2012 8.6
2 14NOV2012 7.2
2 03JAN2013 5.5
2 12MAR2013 5.3
;


DATA want;

    SET have;
    BY ID_NO;

    RETAIN PRIOR_RESULT;
    RETAIN PRIOR_PMINVAL;

    /* "Reset" the PMINVAL when first record in set is reached */
    IF FIRST.ID_NO THEN DO;
        PMINVAL = RESULT;
        PRIOR_RESULT = RESULT;
        PRIOR_PMINVAL = RESULT;
    END;

    /* Return the MIN between the prior RESULT and PMINVAL values */
    ELSE DO;
        PMINVAL = MIN(PRIOR_RESULT,PRIOR_PMINVAL);
        PRIOR_RESULT = RESULT;
        PRIOR_PMINVAL = PMINVAL;
    END;

    DROP PRIOR_RESULT PRIOR_PMINVAL;

RUN;

Result:
enter image description here