0
votes

I'm trying to find a way to store 2 values in one index of a vector.

I have a matrix that I'm translating into vector coordinates, so that I can take a random sample of that vector and then translate the location of those samples back into matrix coordinates.

filter_function<-function(df,perc){
  rows<-dim(df)[1]
  cols<-dim(df)[2]

  vec<-vector("list",rows*cols)
  for(i in 1:rows){
    for(j in 1:cols){
      vec[(i-1)*cols+j]<-df[i,j]
    }
  }

  n<-rows*cols
  filter<-sample(vec,n*perc)
}

The problem I'm having is that the function sample doesn't return the vector coordinate and also I don't know how to get the row and column values translated back to me. I'm wondering if there's an alternate method where I would change line 8 to look something like this:

vec[(i-1)*cols+j]<-c(i,j)

This obviously gives me the error message

In vec[(i - 1) * cols + j] <- c(i, j) : number of items to replace is not a multiple of replacement length

So I'm wondering if there's something similar I can do? Once I have the coordinates, I need to ideally be able to remove the values in those positions in a quick step, so something like

df<-df[-filter]

Note: My data has a lot of repeats of 0s and 1s and everything in between, so it wouldn't work to take a random sample and then use the which or match functions.

Please help!

1
instead of storing them in a vector, rather store them in a list.. that would be easier.. I really do not know what is happening. Maybe if you can shed more light one can be able to help you. - Onyambu
And what is the meaning of df(I,j) This clearly defies the syntax of R.. I doubt whether df is a function..as per your definition. since you do have dim(df) - Onyambu
I just corrected it, it was supposed to be df[i,j], and df is not a function..df is a matrix, hence why I'm indexing it and using dim(df). The purpose of this is that I want to sample the matrix completely randomly. 'Sample' only samples rows or columns of a matrix, so I'm translating it into a vector of data of which to sample from, and then I want to translate the sampled data back into matrix coordinates. The reason I want these coordinates is to know which data points I should remove, because I need to remove them. All of this was explained above. - Ksenia Kasey Arzumanova
@KseniaKaseyArzumanova if the answer below worked for you please consider accepting it as a solution (check mark to the left). This lets the community know it worked and that the issue should be closed. - CPak

1 Answers

0
votes

You can accomplish this with unlist

Example data

df <- as.data.frame(matrix(1:25,nrow=5))

  V1 V2 V3 V4 V5
1  1  6 11 16 21
2  2  7 12 17 22
3  3  8 13 18 23
4  4  9 14 19 24
5  5 10 15 20 25

Operation

unlist your data frame into a vector. Notice that it unlists it column-wise

m <- unlist(df)

# V11 V12 V13 V14 V15 V21 V22 V23 V24 V25 V31 V32 V33 V34 V35 V41 V42 V43 V44 V45 V51 V52 V53 V54 V55 
#   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25

sample a random index

set.seed(1)
index <- sample(1:length(m), 1)
# 2

To get the value in the data frame

R <- ifelse(index %% nrow(df) == 0, nrow(df), index %% nrow(df))  # row
C <- ifelse(index %% nrow(df) == 0, index / nrow(df), floor(index / nrow(df))+1)   # column

df[R,C]

# 2

More detailed look at ifelse statements above

To convert from an index in the vector to an index in the data frame, consider the column-index first. If the index is between 1:5, the value is in the 1st column in df, if between 6:10, the value is in the 2nd column in df, etc. To get the column-index, we can do something like (but not quite), index / number of rows in df. To deal with values like index==2, which gives 2 / 5 = 0.4, I'd like to round down floor( 0.4 ) = 0, then add 1. However, this doesn't work when index==multiples of 5, which gives 5 / 5 = 1; floor(1) + 1 = 2. Therefore, I deal with this using an ifelse. If index is multiple of 5 (index %% nrow(df) == 0) == T, then use the equation index / nrow(df), else use the equation floor(index / nrow(df))+1. The same works out for the row-index which uses modulus, %% to return the remainder.

Double check

Let's make sure we can find the right row and column for every possible index

for (index in 1:25) {
      R <- ifelse(index %% nrow(df) == 0, nrow(df), index %% nrow(df))
      C <- ifelse(index %% nrow(df) == 0, index / nrow(df), floor(index / nrow(df))+1)
      print(df[R,C])
}

# 1
# 2
# 3
etc