I want to sample 5 random rows 1,000 times and summarize them in a data frame. I have a problem with the replace = FALSE
and I wonder where to put it to replace = TRUE
.
I have a dataset of 5,000 rows which looks (simplified) like this:
Fund.ID Vintage Type Region.Focus Net.Multiple Size
[1,] 4716 2003 2 US 1.02 Small
[2,] 2237 1998 25 Europe 0.03 Medium
[3,] 1110 1992 2 Europe 1.84 Medium
[4,] 12122 1997 25 Asia 2.04 Large
[5,] 5721 2006 25 US 0.86 Mega
[6,] 730 1998 2 Europe 0.97 Small
This is my function which starts with one random row and includes a constraint for the 5 rows being drawn.:
simulate <- function(inv.period) {
start <- sample_n(dataset, 1, replace=TRUE) #draw random first fund
t <- start$Vintage:(start$Vintage + inv.period) #define investment period contingent on first fund
fof <- dataset[sample(which(dataset$Vintage %in% t), 5, replace = FALSE), ] #include constraint, 5 funds in portfolio
}
#replicate this function 1,000 times
#and give out as a data frame with portfolios classified
library(plyr)
library(dplyr)
fof.5 <- rdply(1000, simulate(4))
rename(fof.5, FoF.ID = .n)
If I use replace=FALSE
in the simulate function (after fof <-), I get this error:
Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE' The whole expression works if I put replace = TRUE. However, this would not be correct, as a row could be drawn twice in the same sample, which I do not want.
Is there a way to put replace=FALSE
when rows are drawn, but put replace=TRUE
for the overall dataset? It should be: A row can be drawn only once within the sample but can be drawn another time in another sample.
library(package)
it helps others to replicate your code and find solutions. - Pierre Lsimulate
does not return any value. It will also fail any time the length oft
is less than5
. For example, let's saystart
returns row 4 from its sample. Thenstart$Vintage
will be1997
. Now let's sayinv.period
is 1. Two values are being sampled, rows 2 and 4. You are asking for 5 values to be extracted without replacement. That doesn't make sense. - Pierre Lsample_n(dataset, 1, replace=TRUE)
andsample_n(dataset, 1, replace=FALSE)
? - Pierre L