1
votes

I have a data set with 500,000 accounts.

I'm going to be running some analysis on this group and I will need a test and control.

I would like my control to be 2% (10,000 accounts) and my test be the remaianing 98%.

I know I can use a random variable or proc surveryselect to get this 2% sample.

But the key thing here is that I want my test and control to have the same average for variable x (let's say account_age).

Is there anyway in SAS to do a surverselect or something else to get a sample where one metric has the same average value for both groups (the whole table and the sampled subset from that table)?

1

1 Answers

1
votes

What you are looking for is a stratified sample. In this case the stratification is by age.

You can do the following:

  • Sort the data by account_age
  • Take 1/50 records for each of the two groups

You can do this using an nth sample on the sorted data.