0
votes

I have a question about creating a variable that contains the mean of another variable collapsed by 3 groups.

My dataset has a bunch of observations for workers, including their education level (imagine it is a categorical variable with values 1, 2 and 3 meaning high-school dropout, finished high school and finished college respectively), their wage, the year when the observation was taken and the firm that they work in (a numerical ID). Each worker can work in multiple firms across the years.

I would like a variable that has for each worker the mean wage of workers of the same education level in the same firm in a given year. So I need to collapse the dataset by 3 groups (year education firmID), and I am not sure how to do that. I am sorry I am not including any code, I am not sure how it would be helpful here. The dataset is also enormous, so the more efficient the code, the better.

Thank you so much in advance!

1

1 Answers

1
votes

PROC MEANS with a CLASS statement containing your three groups is your simplest option. That's what it does, basically - summarize across class groupings.

So first run a PROC MEANS to get your value, then merge with your master dataset to put the value on the dataset.