1
votes

I have a dataset that is sorted by 6 variables.

I want to use the first.variable ( in my case the sixth variable) to set initial values for a new variables (7th,8th variables) of the data set.

Example:

if first.variable_name then do;
ratevalue = 999;
factor = 100.00;
end;

the first.variable is the 6th variable in the groupby.

The first column in the group has a date value of '3-20-2017' hardcoded. So there is only 1 group with the fist column comprising of all 200K observations.

Question is when I execute the above piece of code I am expecting the ratevalue and factor assigned to the observations where first.variable_name = '1'.

However, the the values are assigned to all the 200k observation starting with the first.variable.

If i use

if last.variable_name then do;

'ratevalue = 999;'
factor = 100.00;
end;

then it is assigning the above values to all the observations starting from the last observation of the group made by the 6th variable in the group.

Is that how it is supposed to work.

Thanks!

1
If your 6 BY variables uniquely identify each row then the last of the variables in the list will make single observation groups that always are both the first and the last. Is that what it going on? if not then post a simple example with just a few observations and a couple of BY variables to demonstrate your question.Tom
Tested and Tom should be right; data wanted; by date_var; set smt; if first.var6 then ... should yield what you want.pinegulf

1 Answers

0
votes

You should be able to have the output you want without issues. Make sure your data is sorted before grouping: you can either use a proc sort or the notsorted in the BY statement.

See my example below with 7 columns, I am grouping by the first 6:

data have;
input
  Period    Region $  city $ Sales1    Sales2    Sales3     Sales4     Sales5  total ;
  datalines;
   1         North     XY  1 2 2  2 2 30
   1         North     XY  1 2 2 2 2 40
   1         South     ZZ  1 1 1  2 2 100
   1         South     ZY  1 1 0 1 1 40
   ;
run;

data want;
set have;
by Period    Region   city  Sales1    Sales2    Sales3     Sales4     Sales5 notsorted;
if first.Sales5 = 1 then do; ratevalue=999; factor=100.00; end;
run;

Output:

 Period=1 Region=North city=XY Sales1=1 Sales2=2 Sales3=2 Sales4=2 Sales5=2 total=30 f_s5=1 l_s5=0 ratevalue=999 factor=100
 Period=1 Region=North city=XY Sales1=1 Sales2=2 Sales3=2 Sales4=2 Sales5=2 total=40 f_s5=0 l_s5=1 ratevalue=. factor=.
 Period=1 Region=South city=ZZ Sales1=1 Sales2=1 Sales3=1 Sales4=2 Sales5=2 total=100 f_s5=1 l_s5=1 ratevalue=999 factor=100
 Period=1 Region=South city=ZY Sales1=1 Sales2=1 Sales3=0 Sales4=1 Sales5=1 total=40 f_s5=1 l_s5=1 ratevalue=999