2
votes

I have a data set containing an unbalanced panel of observations, where I want to forward and backward fill missing and/or "wrong" observations of ticker with the latest non-missing string.

id time  ticker_have   ticker_want
------------------------------ 
1   1     ABCDE          YYYYY 
1   2     .              YYYYY 
1   3     .              YYYYY 
1   4     YYYYY          YYYYY   
1   5     .              YYYYY
------------------------------
2   4     .              ZZZZZ
2   5     ZZZZZ          ZZZZZ
2   6     .              ZZZZZ
------------------------------
3   1     .              .
------------------------------
4   2     OOOOO          OOOOO
4   3     OOOOO          OOOOO
4   4     OOOOO          OOOOO

Basically, if the observation already has a ticker, but this ticker is not the same as the latest non-empty ticker, we replace this ticker using the latest ticker.

So far, I have managed to fill missing observations forward using this code

proc sql;
create table have as select * from old_have order by id, time desc;
quit;

data want;
  drop temp;
  set have;
  by id;
  /* RETAIN the new variable*/
  retain temp; length temp $ 5;
  /* Reset TEMP when the BY-Group changes */
  if first.id then temp=' ';
  /* Assign TEMP when X is non-missing */
  if ticker ne ' ' then temp=ticker;
  /* When X is missing, assign the retained value of TEMP into X */
  else if ticker=' ' then ticker=temp;
run;

Now I am stuck figuring out the cases where I can't access the non-missing value using last.ticker or first.ticker ...

How would one do this using DATA or PROC SQL or any other SAS commands?

2

2 Answers

1
votes

You can do this several ways, but proc sql with some nested sub-queries is one solution.

(Read it from inside out, #1 then 2 then 3. You could build each subquery into a dataset first if it helps)

proc sql ;
  create table want as 
  /* #3 - match last ticker on id */
  select a.id, a.time, a.ticker_have, b.ticker_want
  from have a
       left join
        /* #2 - id and last ticker */
       (select x.id, x.ticker_have as ticker_want
        from have x
             inner join
              /* #1 - max time with a ticker per id */
             (select id, max(time) as mt
              from have
              where not missing(ticker_have)
              group by id) as y on x.id = y.id and x.time = y.mt) as b on a.id = b.id
  ;
quit ;
1
votes

Consider using a data step to retrieve the last ticker by time for each id, then joining it to main table. Also, use a CASE statement to conditionally assign new ticker if missing or not.

data LastTicker;
    set Tickers (where=(ticker_have ~=""));
    by id;  
    first = first.id;
    last = last.id; 
    if last = 1;
run;

proc sql;
    create table Tickers_Want as
    select t.id, t.time, t.ticker_have, 
           case when t.ticker_have = ""
                then l.ticker_have 
                else t.ticker_have 
           end as tickerwant
    from Tickers t
    left join LastTicker l
        on t.id = l.id
    order by t.id, t.time;
quit;

Data

data Tickers;
   length ticker_have $ 5;
   input id time ticker_have $;
   datalines;
1   1     ABCDE
1   2     .    
1   3     .    
1   4     YYYYY
1   5     .    
2   4     .    
2   5     ZZZZZ
2   6     .    
3   1     .    
4   2     OOOOO
4   3     OOOOO
4   4     OOOOO
;

Output

Obs id  time  ticker_have   tickerwant
1    1     1        ABCDE        ABCDE
2    1     2                     YYYYY
3    1     3                     YYYYY
4    1     4        YYYYY        YYYYY
5    1     5                     YYYYY
6    2     4                     ZZZZZ
7    2     5        ZZZZZ        ZZZZZ
8    2     6                     ZZZZZ
9    3     1    
10   4     2        OOOOO        OOOOO
11   4     3        OOOOO        OOOOO
12   4     4        OOOOO        OOOOO