1
votes

I have a dataset which broadly speaking looks like this:

╔════════╦══════════╦══════════╗
║ TICKER ║   DAY    ║   TIME   ║
╠════════╬══════════╬══════════╣
║ AET    ║ 20100104 ║ 09:30:04 ║
║ AET    ║ 20100104 ║ 09:30:04 ║
║ AET    ║ 20100104 ║ 09:30:04 ║
║ AET    ║ 20100104 ║ 09:30:05 ║
║ AET    ║ 20100104 ║ 09:30:05 ║
║ AET    ║ 20100104 ║ 09:30:06 ║
║ AET    ║ 20100104 ║ 09:30:06 ║
║ AET    ║ 20100104 ║ 09:30:06 ║
╚════════╩══════════╩══════════╝

I want an extra column added which represents a value of 0 in case it's not the last observation of that time and a 1 in case it's the last observation of that time. Therefore I want to add add the following column:

╔════════╦══════════╦══════════╦═══════╗
║ TICKER ║   DAY    ║   TIME   ║ dummy ║
╠════════╬══════════╬══════════╬═══════╣
║ AET    ║ 20100104 ║ 09:30:04 ║     0 ║
║ AET    ║ 20100104 ║ 09:30:04 ║     0 ║
║ AET    ║ 20100104 ║ 09:30:04 ║     1 ║
║ AET    ║ 20100104 ║ 09:30:05 ║     0 ║
║ AET    ║ 20100104 ║ 09:30:05 ║     1 ║
║ AET    ║ 20100104 ║ 09:30:06 ║     0 ║
║ AET    ║ 20100104 ║ 09:30:06 ║     0 ║
║ AET    ║ 20100104 ║ 09:30:06 ║     1 ║
╚════════╩══════════╩══════════╩═══════╝

I'm usying SAS Enterprise Guide, but don't know how to compare a value of row 1 with a value of row 2. What I'm basically trying to do is create an expression that looks like this:

IF TIME = TIME(row+1)
THEN 0
ELSE 1

I don't know if this is even possible in SAS enterprise. I'm totally new to this, and I would have no problem do such a thing in Excel. Excel just doesn't cope well with 17 million rows of data. Thanks in advance!

2

2 Answers

2
votes

You need to use the last. function here. This can be used as long as the the data is sorted appropriately (e.g. below it will need to be sorted by TICKER, DAY and TIME)

data new;
  set original;
  by TICKER DAY TIME;
  if last.TIME then dummy = '1';
run;

(If it is not then you will need to do a proc sort, however from the sample data you posted it was already)

Edit: Thanks @Joe

0
votes

This is a rather crude way of going about it, but it does work... I'm not 100% sure if there is a function for this but this is what I came up with quickly...

//Create index of each obs.
DATA DATE_TIME_TABLE;
SET DATE_TIME_TABLE;
   COLUMN INDEX;
   INDEX=_N_;
   OUTPUT;
RUN;

//Sort by descending index for each time. Last obs of each time now is first.
PROC SORT DATA=DATE_TIME;
BY TIME DESCENDING INDEX;
RUN;

//The first obs = 1 everything else = 0
DATA DATE_TIME_TABLE;
SET DATE_TIME_TABLE;
   IF TIME <> LAG(TIME) THEN
      dummy = 1;
   ELSE dummy = 0;
   END;
RUN;

//Put the table back to how it was
PROC SORT DATA=DATE_TIME_TABLE;
BY INDEX;
RUN;