dataset - Is there a Python module to open SPSS files?

Question

Is there a module for Python to open IBM SPSS (i.e. .sav) files? It would be great if there's something up-to-date which doesn't require any additional dll files/libraries.

possible duplicate of Exporting to SPSS files in Python Django? If you want there is also a recipe on active-state — Bakuriu
Hi, Bakuriu. It's not a duplicate, as I'm not referencing the Django framework, I'm talking about opening, as opposed to exporting/writing a file, and I mentioned the preference for something recent which doesn't require external libraries/dlls. There's some common ground between the questions, but they can elicit different, as well as similar, responses. Thanks for the link, but again, I'm trying to avoid dll files, if possible. — Lamps1829
The other answer cites Django, but it actually has nothing to do with it. Since Exporting requires the ability to write a file, the chances that you can also read it are high. Reading around I strongly believe you have only one choice: use the .dll released by IBM. I can't find any open specification for that file format, which means that the only way to read those file is to use IBM's libraries. You can always try to reverse-engineer the format, but that would take much more time and effort. — Bakuriu
Thanks, Bakuriu. It's unfortunate, but as you said, it is looking likely that IBM's .dll release is the thing to use. — Lamps1829

Otto Fajardo Otto Fajardo · Accepted Answer · 2018-08-22T10:39:06

I have released a python package "pyreadstat" that reads SPSS (sav, zsav and por), Stata and SAS files. It is a wrapper around the C library ReadStat so it is very fast. Readstat is the library used in the back of the R library Haven, which is widely used and very robust.

The package is autocontained. It does not require using R (no need to install an aditional application) and it does not depend on IBM dlls or other external libraries.

For example, in order to read a SPSS sav file you would do:

import pyreadstat

df, meta = pyreadstat.read_sav("/path/to/sav/file.sav")

df is a pandas dataframe. Meta contains metadata such as variable labels or value labels. read_sav reads both sav and zsav (compressed) files. There is also a function read_por for old por (portable) files.

You can find it here: https://github.com/Roche/pyreadstat

dataset - Is there a Python module to open SPSS files?

4 Answers