0
votes

I am new to sas and need some help (yes, Ive looked through everything - maybe I am just not asking it the right way but here I am): lets say I want to create a dataset from sashelp.cars and I want there to be 5 observations for every make:

ie: 5 obs for acura, 5 obs for audi, 5 obs for bmw etc. ANd I want all the data returned, but only limited to the 5 observations per make.

How would I do this without a macro but a loop instead? My actual data set has 93 distinct values and I don't want to use 93 macro calls

Thanks in advance!!!!

1
You need to post what you've tried as well here. If you have no idea of where to start, you need to use an explicit OUTPUT statement and BY group processing with the FIRST statement. The code is about 10 lines in total, a PROC SORT (assuming it's not sorted) and then a data step.Reeza

1 Answers

1
votes

Which 5 obs do you want for each make? The first 5? The last 5? Some sort of random sample?

If it's the latter, proc surveyselect is the way to go:

proc sort data = sashelp.cars out = cars;
  by make;
run;

proc surveyselect 
  data = cars
  out = mysample
  method = URS 
  n = 5
  selectall;
  strata make;
run;

Setting method = URS requests unrestricted random sampling with replacement. As this allows the same row to be selected multiple times, we are guaranteed 5 rows per make in the sample, even if there are < 5 in the input dataset. If you just want to take all available rows in that scenario, you can use method = srs to request simple random sampling.

If you want the first 5 per make, then sort as before, then use a data step:

data mysample;
  set cars;
  by make;
  if first.make then rowcount = 0;
  rowcount + 1;
  if rowcount <= 5;
run;

Getting the last 5 rows per make is very similar - if you have a key column that you can use to reverse the order within each make, that's the simplest option.