9
votes

I have two versions of SPSS at work. SPSS 11 running on Windows XP and SPSS 20 running on Linux. Both copies of SPSS work fine. Files created with either version of SPSS open without incident on the other version of SPSS. I.E. - I can create a .sav file with SPSS 20 on Linux and open it on SPSS 11 on Windows without incident.

But, if I create a .sav file with SPSS 20 and import the data into either R or PSPP (on Linux), I get a bunch of warnings. The data appears to import correctly, but I am concerned by the warnings. I do not see any warning when importing a .sav from SPSS 11 or other .sav files I have been sent. Many of the analysts at my company use SPSS so I've gotten SPSS files from different versions of SPSS and I have never before seen this warning. The warning messages are nearly identical between PSPP and R which makes sense. AFAIK, they use the same underlying libs to import the data. This is the R error:

Warning messages:
1: In read.spss("test.sav") :
test.sav: File-indicated value is different from internal value for at least one of  the three system values.  SYSMIS: indicated -1.79769e+308, expected -1.79769e+308; HIGHEST: 1.79769e+308, 1.79769e+308; LOWEST: -1.79769e+308, -1.79769e+308   

2: In read.spss("test.sav") :
test.sav: Unrecognized record type 7, subtype 18 encountered in system file

The .sav file is really simple. It has two columns, dumb and dumber. Both are integers. The first two contains two values of 1.0. The second row contains two values of 2.0. I can provide the file on request (I don't see any way to upload it to SO). If anyone would like to see the actual file, PM me and I'll send it to you.

dumb  dumber
1.0   1.0
2.0   2.0

Thoughts? Anyone know the best way to file a bug against R without getting roasted alive on the mailing list? :-)

EDIT: I used the term "Error" in the title line. I'll leave it, but I should not have used this word. The comments below are correct in pointing out that the messages I am seeing are warnings, not errors. I do however feel that this is made clear in the body of the question above. Clearly, the SPSS data format has changed over time and SPSS/IBM have failed to document these changes which is the root of the problem.

2
No real insight, but can echo the sentiment of getting a litany of these warnings every time I import from SPSS into R. If it makes you feel any better, my unscientific manual checks b/t R and SPSS have always shown that the data imported without error. I hope we can get some good insight into this!Chase
I'm glad to hear that the data you have seen appears to have imported correctly. My problem is that I can't afford to have errors and dealing with the dates stuff is tricky enough, without wanting to run the risk of any errors because of whatever this warning may be telling us. I can't tell my boss that my cross-tabs are a little off because I used R rather than SPSS. Its too hard to get another job these days. :-)Choens
While I sympathize with your comments about the snarkiness of the R list, I also agree with the other commenters that it's not fair to count this as a bug in R. R is trying as hard as it can, and warning you that something might be wrong. I think if you want to try to fix/diagnose this yourself, you're going to have to get very familiar with debugging of C components of R code. Start by tracking down the specific line in the C code (i.e., line 585 of sfm-read.c). Figure out what function it is (read_machine_flt64_info), then do source-level debugging of ...Ben Bolker
(to) set a breakpoint in that function and step through it while reading in the relevant file. (I think you need the R extensions manual for this info.) If you're not set up to do this (i.e. have a debugging environment set up and be comfortable with source-level debugging of C code) this is going to be a hard slog. However, I don't see that you have much choice -- you can (1) dig in and try to figure it out yourself [and I do think that if you encounter trouble as you work your way through it that you would encounter a positive reception on the R development list ...]; (2) hire a consultant:Ben Bolker
(3) learn to live with the warnings.Ben Bolker

2 Answers

11
votes

It's not an error message. It is only a warning. SPSS refuses to document their file formats so people have not been motivated to track down by reverse engineering the structure of new "subtypes". There is no way to file a bug report without getting roasted because there is no bug .... other than a closed format and that bug complaint should be filed with the owners of SPSS!

EDIT: The R-Core is a volunteer group and takes it responsibilities very seriously. It exerts major efforts to track down anything that affects the stability of systems or produces erroneous calculations. If you were willing to be a bit more respectful of the authors of R and suggest the possibility of collaboration on the R-devel mailing list to identify solutions to this problem without using the term "bug", you would arouse much less hostility. There might be someone who would be willing to see if a simple .sav file such as the one you constructed could be examined under a hexadecimal microscope to identify whatever infinite negative value is being mistaken for another infinite negative value. Most of the R-Core is not in possession of working copies of SPSS.

You could offer this link as an example of the product of others who have attempted the reverse engineering of SPSS .sav formats:

http://svn.opendatafoundation.org/ddidext/org.opendatafoundation.data/references/pspp_source/sfm-read.c

Edit: 4/2015; I have seen a recent addition to the ?read.spss help file that refers one to pkg:memisc: "A different interface also based on the PSPP codebase is available in package memisc: see its help for spss.system.file." I have used that package's function successfully (once) on files created by more recent versions of SPSS.

1
votes

The SPSS file format is not publicly documented and can change, but IBM SPSS does provide free libraries that can read and write the SAV file format. These mask any changes to the format. You can get them from the SPSS Community website (along with many other free goodies including the SPSS integration with R). Go to www.ibm.com/developerworks/spssdevcentral and look around. BTW, there have been substantial additions/changes to the sav file since year 2000, although the core data can still be read by old versions.

HTH, Jon Peck