Looking to expand data without interpolating in SAS

Question

I have some video coding data that were coded in 5 minute intervals. The variables I am working with specifically is time when behavior changed (relative to the beginning of the 5 minute video in seconds and milliseconds) has a different number of observations per individual per 5 minutes and the behavior itself. For example:

Participant     Time_Relative      Behavior
     1               0                3
     1               123.3            4
     1               153.6            1
     1               300              4
     2                0               5
     2                360             3

What I am looking to do is to expand the data so each participant has an equal number of observations (current participant observations range from 3-33) say observations for every .5 second in the five minute video but have the behavior stay the same until the time of change. I've tried multiple PROC EXPAND functions (method=none, from= to=, factor(x,x), etc.) but it keeps either interpolating and creating means for the behavior, or interpolating the time series variable to some strange numbers that don't really make any sense!

For example, I want participant 1 and 2 to have the same number of time observations and behaviors:

Participant  Time_elapsed_from_video_start(seconds)   Behavior
     1                         0                         3
     1                         .5                        3
     1                         1                         3
     1                         1.5                       3
     .                            
     . 
     .
     1                         123                       4
     1                         123.5                     4
     2                         0                         5
     2                         .5                        5
     2                         1                         5
     2                         1.5                       5
     .                            
     . 
     .
     2                         123                       5
     2                         123.5                     5

(The dots here represent an ellipse to the end of the data for participant 1 and 2 NOT missing data) In the end, I am trying to have 300 observations for each participant with each observation being a half second apart (Rounding the actual observation seconds to their closest half second). The behavior reported would stay the same until the actual change was observed at the closest half second.

For Participant 1, why does row 3 have time 3.3 that is 123.3 ? Are the time values relative from the prior time of changed behavior ? In other words, is time_relative really a time_duration of the behavior noted in the prior row ? — Richard
Hi Richard, that was a mistake on my part. No, we have another variable that shows the duration of the behavior seen. This variable in particular should be time relative to the beginning of the video in seconds thus when sorted it should be from 0-300. — Dakota Witzel

Richard Richard · Accepted Answer · 2019-10-13T08:27:02

Proc EXPAND deals with named intervals when converting from an aperiodic to periodic interval, and can not at the same time use a factor option (to get to say half seconds)

Presuming the time values in the data are actually durations, instead of time stamps, an intermediate step is needed to transform the durations to elapsed from start.

data have;
  input id time_relative response;datalines;
 1 0      3
 1 123.3  4
 1 3.3    1
 1 300    4
 2 0      5
 2 360    3
run;

data have2;
  set have;
  by id;

  if first.id then time = time_relative; else time+time_relative;
run;

/* From Help -- Overview: EXPAND Procedure
 * You can also convert aperiodic series, observed at arbitrary points in time, 
 * into periodic estimates. For example, a series of randomly timed quality control 
 * spot-check results might be interpolated to form estimates of monthly average defect rates. 
 */

proc expand data=have2 out=want to=second;
  by id;
  id time;
  convert response=response2 / method=step; 
  format time 6.2;
run;

From Help

ID Statement

ID variable;

The ID statement names a numeric variable that identifies observations in the input and output data sets. The ID variable’s values are assumed to be SAS date or datetime values.

The input data must form time series. This means that the observations in the input data set must be sorted by the ID variable(within the BY variables, if any). Moreover, there should be no duplicate observations, and no two observations should have ID values within the same time interval as defined by the FROM= option.

You can code your own 'expand' in a DATA step:

data want (keep=id X Y rename=(X=time Y=response));

  length id X Y 8;

  X = 0;
  Y = .;

  do until (last.id);

    set have;
    by id;

    if first.id 
      then time = time_relative; 
      else time + time_relative;

    do X = X by 0.5 while (X < time);
      output;      
    end;

    Y = response;
  end;  

  output;

  format X 6.1;
run;

Looking to expand data without interpolating in SAS

1 Answers