I need to import an SPSS .sav file into R every day as a data frame without value labels. The file is 120,000+ obs and growing. This process is getting incredibly slow, so I want to make sure I'm using the fastest possible method. I've been playing around with the functions in foreign, haven, and memisc. I'm working with RDS if that makes a difference.
Edit: My file is 126343 x 33067 and 12.1 GB.I'm just simply running the following code:
library(haven)
data <- read_sav(file)
I can't share this file, but to attempt to replicate, I did:
library(haven)
n <- 126343
exd <- data.frame(c(replicate(2000, sample(letters, n, replace = TRUE),
simplify = FALSE),
replicate(1306, runif(n),
simplify = FALSE)))
dim(exd)
## [1] 126343 3306
tmp <- tempfile(fileext = ".sav")
write_sav(exd, tmp)
system.time(exd2 <- read_sav(tmp))
## user system elapsed
## 173.34 13.94 187.66
Thanks!