I have a data set with two main variables of interest now - Major and Major_Code. These should match up 1 to 1 but there are some errors I need to fix and what I've found is that for 14 Major_Code values, there are two different Majors. This is only due to a change in spelling or punctuation, like "ed." and "education". They are supposed to have the same value here but don't.
So I have a table with 7 pairs. Each pair has the same Major_Code and different a Major. How can I select one of the Major vales to use for each code? My only idea was through an if-then statement but that seems horribly inefficient.
I found the doubled values like this:
proc freq data=majorslist; tables Major_Code/out=majorcodedups; run;
proc print data=majorcodedups; where COUNT > 1; run;
So I can easily find these observations but can't extract certain values to overwrite onto another observation. I've looked into arrays, macros, sql and transpose but it's all a bit over my head right now.
Logically it would work like this:
from obs i to n, find value for variable x at obs i, output value onto variable y at obs i, go to obs(i+1) and repeat.