0
votes

I need to sample a dataframe by one column and then take the sampled values and the values beside the chosen one for further calculations.

For example, below is some code. Let's say I sample on the first column (unused) and it selects a value. I then want to output that value and one two columns over (carryover), and in the same row..

dat <- data.frame(cbind(rnorm(30,600,sd=100),rnorm(30,300,sd=50),rnorm(30,200,sd=50),rnorm(30,200,sd=200)))
colnames(dat) <- c("unused","deduct","carryover","used")
unused <- dat$unused

  for (i in 1:10) {
    unused[i] <-matrix(sample(unused,size=1,replace=TRUE))
  }
1
I have a hard time understanding what you are trying to do. Do you want to sample rows (or part of rows) from a data.frame ? Could you give an example how your expeced output would look like, given your example data? - dario
Let's say I sample on unused and it picks a value from row 21. I then want to take that value and the value of carryover from row 21 (which would have been in col. 3), i.e., the same row that the sampled value was taken from. - Angus
We could first sample from a pool of row numbers (as many rows as in your data.frame). Then we use these as row-indexes and select the rows and only the two columns of interest as a new data frame - dario
So, rather than sample by column, sample by row and output the whole row and just extract what I need. - Angus

1 Answers

1
votes
  dat <- data.frame(cbind(rnorm(30,600,sd=100),rnorm(30,300,sd=50),rnorm(30,200,sd=50),rnorm(30,200,sd=200)))
  colnames(dat) <- c("unused","deduct","carryover","used")

  set.seed(1)
  subsample <- dat[sample(seq_along(dat[, 1]), size = 10), c("unused", "carryover")]

subsample is now:

       unused carryover
  25 696.4313  120.7247
  4  541.7930  299.9415
  7  621.7899  194.5108
  1  383.2561  166.8942
  2  555.0840  210.4745
  23 509.4725  179.7150
  11 716.6414  214.2963
  14 531.1205  227.8825
  18 496.7440  140.7691
  19 588.5641  142.7861