0
votes

I have a very large data frame that contains 100 rows and 400000 columns.

To sample each column, I can simply do:

df <- apply(df, 2, sample)

But I want every two column to be sampled together. For example, if originally col1 is c(1,2,3,4,5) and col2 is also c(6,7,8,9,10), and after resampling, col1 becomes c(1,3,2,4,5), I want col2 to be c(6,8,7,9,10) that follows the resampling pattern of col1. Same thing for col3 & col4, col5 & col6, etc.

I wrote a for loop to do this, which takes forever. Is there a better way? Thanks!

1

1 Answers

1
votes

You might try this; split the data frame every two columns with split.default, for each sub data frame, sample the rows and then bind them together:

df <- data.frame(col1 = 1:5, col2 = 6:10, col3 = 11:15)

index <- seq_len(nrow(df))
cbind.data.frame(
    setNames(lapply(
        split.default(df, (seq_along(df) - 1) %/% 2), 
        function(sdf) sdf[sample(index),,drop=F]), 
    NULL)
)

#  col1 col2 col3
#5    5   10   12
#4    4    9   11
#1    1    6   15
#2    2    7   14
#3    3    8   13