0
votes

I'm trying to replace step by step all the for-loops I wrote (I read somewhere that it is bad programming :-( ) and I'm trying to achieve that with purrr. So I basically understand the map()-functions as:

target <- map(list_which_is_argument_for_function,
              function)

But what do I have to do, if the argument lists aren't from e.g. the data frame I apply the map()-function to?

So let's say I have three data frames.
df1 is essentially a collection of images: df1 = (Image_ID, File_Path, Date). df2 is a collection of site coordinates and dates: df2 = (X, Y, Date)
... and df3 is a matching list, which contains references from every coordinate to their corresponding image. df3 looks like this: df3 = (Coordinate_Index, Image_Index). The lengths of the lists differ form each other.

Now I want to add columns to df3 for e.g. date diff calculation. In a for-loop I would do:

for(i in 1:length(df3$Coordinate_Index)) {
    df3$DateDiff[i] <- datediff_func(df1$Date[df3$Coordinate_Index[i]], df2$Date[df3$Image_Index[i]])
}

But if I do:

df3$DATE_DIFF <- map2(df1$Date[df3$Image_Index], 
                      df2$Date[df3$Coordinate_Index], 
                      abs(difftime(as.Date(), as.Date(), units = 'days')))

I get:

argument “x” is missing, with no default

In other cases where I have to use extract() from the raster package, I need to give the coordinates with cbind() as an argument, resulting in a vector length error, because - obviously - if I have to bind X and Y together, it results in a list with double the length of the other argument list.

Could anyone explain the purrr way to me?

1
If is good to learn about these things (although I would start with base R and lapply). But you have been misled; for loops are fine. Except when you have very many iterations; let's say 1000 or more because then they might be too slow. They are really bad to use when a function is already vectorized; and this is often overlooked (e.g. 1:10 + 2)Robert Hijmans

1 Answers

1
votes

for loops are not bad in themselves. Most of the for loops are inefficiently written which makes them perform bad as compared to other alternatives.

To change the for loop to map assuming output of datediff_func is a numeric value you can do :

df3$DateDiff <- purrr::map_dbl(seq_len(nrow(df3)), ~datediff_func(
               df1$Date[df3$Coordinate_Index[.x]], df2$Date[df3$Image_Index[.x]]))