1
votes

I'm trying to use dplyr to have the variables that are factors be represented by their values after importing a SPSS dataset using haven.

Two questions: 1) how can I loop over the columns in the dataframe containing labels over the imported dataset using dplyr?

u<-which(sapply(i,function(x) !is.null(attr(x,"labels"))))
n<-mutate_each(i,(as_factor),... = u)

2) how can I set the correct date after importing .sav file from SPSS. i$e3 is a date, but I'm uncertain how I can convert it to proper r-lingo.

Dataset:

> dput(i)
structure(list(e = structure(c(1, 1, 2, 2, 1), label = "Sex", class = c("labelled", 
"numeric"), labels = structure(c(1, 2), .Names = c("Male", "Female"
))), e2 = structure(c(3, 3, 3, 3, 3), label = "The time from injury to surgery", class = c("labelled", 
"numeric"), labels = structure(c(1, 2, 3), .Names = c("< 12 hours", 
"12 to 24 hours", "> 24 hours"))), e3 = structure(c(13254624000, 
13431139200, 13437360000, 13493174400, 13233369600), label = "Surgery Date")), .Names = c("e", 
"e2", "e3"), row.names = c(NA, -5L), class = "data.frame")
1
i is a very confusing name for a data set, since it is usually reserved for iterators or indices. You should probably split your questions into two, since they seem very unrelated. Also, you don't need (arguably shouldn't have) "[r]" in the title.Frank
I think you need mutate_each_. Try mutate_each_(i, funs(as_factor), names(u))aosmith
It still throws the same error...Misha

1 Answers

1
votes

I'm not sure how to adjust your dates properly (you can change the / 10 to / 100 or 1000). You could do this with base r:

i <- structure(list(e = structure(c(1, 1, 2, 2, 1), label = "Sex", class = c("labelled", 
                                                                        "numeric"), labels = structure(c(1, 2), .Names = c("Male", "Female"
                                                                        ))), e2 = structure(c(3, 3, 3, 3, 3), label = "The time from injury to surgery", class = c("labelled", 
                                                                                                                                                                   "numeric"), labels = structure(c(1, 2, 3), .Names = c("< 12 hours", 
                                                                                                                                                                                                                         "12 to 24 hours", "> 24 hours"))), e3 = structure(c(13254624000, 
                                                                                                                                                                                                                                                                             13431139200, 13437360000, 13493174400, 13233369600), label = "Surgery Date")), .Names = c("e", 
                                                                                                                                                                                                                                                                                                                                                                       "e2", "e3"), row.names = c(NA, -5L), class = "data.frame")




i$e3 <- as.POSIXct(i$e3 / 10, origin = '1970-01-01')

#   e e2                  e3
# 1 1  3 2012-01-01 19:00:00
# 2 1  3 2012-07-24 03:12:00
# 3 2  3 2012-07-31 08:00:00
# 4 2  3 2012-10-03 22:24:00
# 5 1  3 2011-12-08 04:36:00

i <- setNames(i, sapply(i, function(x) attr(x, 'label')))
i[] <- lapply(i, function(x) {
  if (!is.null(lab <- attr(x, 'labels')))
    names(lab[x])
  else x
})

#      Sex The time from injury to surgery        Surgery Date
# 1   Male                      > 24 hours 2012-01-01 19:00:00
# 2   Male                      > 24 hours 2012-07-24 03:12:00
# 3 Female                      > 24 hours 2012-07-31 08:00:00
# 4 Female                      > 24 hours 2012-10-03 22:24:00
# 5   Male                      > 24 hours 2011-12-08 04:36:00