For a data that has the following variable: Student_ID, Start_week, Sy, Item, Type, School, Tuition and Country. If an observation has a same combination of the (Student_Id, Start_week, Sy, Item, Type and School), it is a duplicate observation.
For instance:
Student_ID Start_week Sy Item Type School
10001 1 11 101 0 2
10001 1 11 101 0 2
The two observation is a duplicate because it has same value for the combination. What I was doing was by:
proc freq data = mydataset;
by Student_ID;
tables Start_week Sy Item Type School;
run;
However, this didn't really help me to see what are duplicates and what are not. I wanted to create a count variable so to count duplicates, but it didn't capture the combination but just the Student_ID. Moreover, by using proc freq, it was running out of memory.
What are some effective ways to identify the duplicate values?