I wonder if there is a way to apply a function to each row of a data.frame such that the column classes are preserved? Let's look at an example to clarify what I mean:
test <- data.frame(startdate = as.Date(c("2010-03-07", "2013-09-13", "2011-11-12")),
enddate = as.Date(c("2010-03-23", "2013-12-01", "2012-01-05")),
nEvents = c(123, 456, 789))
Suppose I would like to expand the data.frame test
by inserting all days between startdate
and enddate
and distribute the number of events over those days. My first try to do so was this:
eventsPerDay1 <- function(row) {
n_days <- as.numeric(row$enddate - row$startdate) + 1
data.frame(date = seq(row$startdate, row$enddate, by = "1 day"),
nEvents = rmultinom(1, row$nEvents, rep(1/n_days, n_days)))
}
apply(test, 1, eventsPerDay1)
This, however, is not possible because apply
calls as.matrix
on test
and thus it gets converted to a character matrix and all column classes are lost.
I already found two workarounds which you can find below, so my question is more of a philosphical nature.
library(magrittr)
############# Workaround 1
eventsPerDay2 <- function(startdate, enddate, nEvents) {
n_days <- as.numeric(enddate - startdate) + 1
data.frame(date = seq(startdate, enddate, by = "1 day"),
nEvents = rmultinom(1, nEvents, rep(1/n_days, n_days)))
}
mapply(eventsPerDay2, test$startdate, test$enddate, test$nEvents, SIMPLIFY = F) %>%
do.call(rbind, .)
############# Workaround 2
seq_along(test) %>%
lapply(function(i) test[i, ]) %>%
lapply(eventsPerDay1) %>%
do.call(rbind, .)
My "problem" with the workarounds is the following:
- Workaround 1: It may not be the best reason, but I simply do not like
mapply
. It has a different signature than the other*apply
functions (as the the order of arguments differs) and I always feel that afor
loop would just have been clearer. - Workaround 2: While being very flexible, I think it is not clear at first sight what is happening.
So does anyone know a function whose call would look like apply(test, 1, eventsPerDay1)
and that will work?
lapply
looping over the sequence of rows and notapply
– akrundata.table
. Please check if that makes it any better – akrunapply()
is meant to work with matrices (and if you pass in a data.frame, it's converted viaas.matrix
) and matrices can only have one atomic data table. Do not useapply()
withdata.frames
. – MrFlick