Being a novel on SPSS I am struggling with finding duplicate cases based on a string-variable in a dataset containing approx 33,000 cases.
I have a variable named "nr" that is supposed to be unique id for every case. However, it turns out that some cases might have two different values in "nr" entered,the only difference being the last character. Resulting in a case being shown as two separate rows.
The structure of the var "nr" is a as follows: XX-XXXXXXX-X or X-XXXXXXX-X i.e 2-7-1 characters or 1-7-1 characters.
I would like to sort out all cases that have a "nr" equal to another case except for the last character.
To illustrate, with a succesfull syntax I would hopefully be able to sort cases like these out from the whole dataset:
20-4026988-2
20-4026988-3
5-4026992-5
5-4026992-8
20-4027281-2
20-4027281-3
Anyone have an idea on how to make a syntax for this? Would be so grateful for any input!