4
votes

I have vectors in R containing a lot of 0's, and a few non-zero numbers.Each vector starts with a non-zero number.

For example <1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0>

I would like to set all of the zeros equal to the most recent non-zero number.

I.e. this vector would become <1,1,1,1,1,1,2,2,2,2,2,2,4,4,4,4>

I need to do this for a about 100 vectors containing around 6 million entries each. Currently I am using a for loop:

for(k in 1:length(vector){

  if(vector[k] == 0){

    vector[k] <- vector[k-1]
  }
}

Is there a more efficient way to do this?

Thanks!

3
Is the vector sequential other than the zeros? If so, I think you should be able to use cummax. ie: vector <- cummax( vector ) - rosscova
I'd think mostly the former, but really just boosting it as a good point. I didn't specifically up-vote, but I do think it's a great suggestion. - rosscova
If you hover over the upvote arrow, it says "this comment adds something useful to the post". I think that is exactly what it means. Maybe not a full answer, but adds something useful. - G5W

3 Answers

7
votes

One option, would be to replace those 0 with NA, then use zoo::na.locf:

x <- c(1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
x[x == 0] <- NA
zoo::na.locf(x)  ## you possibly need: `install.packages("zoo")`
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4

Thanks to Richard for showing me how to use replace,

zoo::na.locf(replace(x, x == 0, NA))
4
votes

You could try this:

k <- c(1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
k[which(k != 0)[cumsum(k != 0)]]

or another case that cummax would not be appropriate

k <- c(1,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0)
k[which(k != 0)[cumsum(k != 0)]]

Logic:

  • I am keeping "track" of the indices of the vector elements that are non zero which(k != 0), lets denote this new vector as x, x=c(1, 7, 13)

  • Next I am going to "sample" this new vector. How? From k I am creating a new vector that increments every time there is a non zero element cumsum(k != 0), lets denote this new vector as y y=c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3)

  • I am "sampling" from vector x: x[y] i.e. taking the first element of x 6 times, then the second element 6 times and the third element 3 times. Let denote this new vector as z, z=c(1, 1, 1, 1, 1, 1, 7, 7, 7, 7, 7, 7, 13, 13, 13)

  • I am "sampling" from vector k, k[z], i.e. i am taking the first element 6 times, then the 7th element 6 times then the 13th element 3 times.

1
votes

Add to @李哲源's answer:

If it is required to replace the leading NAs with the nearest non-NA value, and to replace the other NAs with the last non-NA value, the codes can be:

x <- c(0,0,1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
zoo::na.locf(zoo::na.locf(replace(x, x == 0, NA),na.rm=FALSE),fromLast=TRUE)
# you possibly need: `install.packages("zoo")`
# [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4