2
votes

Apologies in advance for the lengthy post. I am nominally familiar with Python, but think it might be able to easily accomplish the task. Some background: I have survey data where respondents were asked to select the two schools they’re considering applying to out of a list of 1500 or so. The data are stored as two variables (one per institution selected – vname “Institution_1”, “Institution_2”) where each value uniquely identifies a particular institution.

Later on respondent rate the institutions they selected on a 1 to 6 scale on a series of attributes. Each of these ratings is stored as a separate scale variable in the data, and I have two of them – corresponding to what position the institution was selected in. If, for example, Adelphi University is “Institution_1” then the ratings on “Core academics” is stored in variable “Q.32_combined_1”; if Adelphi University is “Institution_2” then the ratings on “Core academics” is stored in variable “Q.36_combined_1”.

I want to combine the ratings for each institution and here’s the SPSS syntax for doing so for this one institution (Adelphi is uniquely identified with a meaningful value of 188429):

DO IF (Institution_1 = 188429).
COMPUTE Adelphi_CoreAcad=Q.32_combined_1.
ELSE IF (Institution_2 = 188429).
COMPUTE Adelphi_CoreAcad =Q.36_combined_1.
END IF.
EXECUTE.

But we have 1,000+ institutions in our data. How can we create a variable for each unique value over these two lists (Institution_1 and Institution_2). Is there a way to use Python to create these variables and/or build the SPSS syntax that would work?

Thanks!

2
I don't know what you're doing, but I'm pretty sure you're doing it wrong. ;-) What are you trying to get? Are you trying to get ratings by school? Are you trying to get some data about respondents? - aghast
So, you've basically got two unique integer ids, and you want a Python script that will combine them into a single unique integer id? If so does this id need any properties, e.g. sequential, no gaps... - Denziloe
I suspect this actually relates to the python integration with spss, yes? Can you edit the question to explain how you have the data stored? Are there one .sav file per questionnaire, or what? - aghast
Yes, I'm trying to get ratings by school. Respondents chose two schools and they're stored in two variables and the values are unique integers The data are stored as one row per respondent in a single .sav file. I want a variable for each unique value from these two lists combined into a single variable --one variable per school to know how respondents who selected that school rated it -- regardless of if it was first or second. The ratings are stored in Q.32_combined_1 for Institution_1 if they selected it as their first and Q.36_combined_1 for Institution_2 if they selected as their second. - Mughees Khan

2 Answers

2
votes

Try this. It's rough, since I don't have SPSS, but I think it's what you're asking for. (Note: I'm not sure that what you're asking for is the right thing, but see if it works, and maybe we'll go from there.)

This creates a set of variables named U188429_CoreAcad, etc. Where the U is just a leading prefix ("U" for "Unit ID"), 188429 is the unit id, and "CoreAcad" is a made up string you can change.

I used categories 'CoreAcad', 'PrettyCoeds', 'FootballTeam' and 'Drinking', because if I had it all to do over again, that's how I would have rated schools. (Except for 'CoreAcad,' which was your thing.)

I assumed that your categories were 32-35 for institution 1, and 36-39 for institution 2. You can change those below as well.

I assumed that you can spss.Submit a bunch of lines together. If not, split the string up and submit the lines one at a time.

I commented out "BEGIN PROGRAM", "import spss", "END PROGRAM" because I'm just feeding stuff into a command-line python2.7. Uncomment those for your use.

#BEGIN PROGRAM.
#import spss, spssaux

# According to the internet, unitids are sparse values.
Unit_ids = [
        188429, # Adelphi
        188430, # Random #s
        171204,
        100001,
]

Categories = {
    'CoreAcad' : ('Q.32_combined_1', 'Q.36_combined_1'),
    'PrettyCoeds' : ('Q.33_combined_1', 'Q.37_combined_1'),
    'FootballTeam' : ('Q.34_combined_1', 'Q.38_combined_1'),
    'Drinking' : ('Q.35_combined_1', 'Q.39_combined_1'),
}


code = """
DO IF (Institution_1 = %(unitid)d).
COMPUTE U%(unitid)d_%(category)s = %(answer1)s.
ELSE IF (Institution_2 = %(unitid)d).
COMPUTE U%(unitid)d_%(category)s = %(answer2)s.
END IF.
EXECUTE.
"""
for unitid in Unit_ids:
    for category, answers in Categories.iteritems():
        answer1,answer2 = answers
        print(code%(locals()))
        #spss.Submit(code%(locals()))


#END PROGRAM.
1
votes

I suggest a different restructure solution:
First, you separate the two institutions into two lines, each with it's corresponding ratings:

varstocases /make institution from Institution_1 Institution_2 
  /make CoreAcad from Q.32_combined_1 Q.36_combined_1
  /make otherRting from inst1var inst2var.

You can add another make subcommand for each additional rating that corresponds to each of the two institutions.
At this point your data has one line per single institution and it's ratings. You can now analyze them, eg:

means CoreAcad otherRting by institution.

Or you can aggregate by institution to analyze their ratings. For example:

DATASET DECLARE AggByInst.
AGGREGATE  /OUTFILE='AggByInst' /BREAK=institution 
    /MCoreAcad MotherRting =MEAN(CoreAcad otherRting).