Using simulations, I want to test/ demonstrate the effects of "censored" data, where some cases are unavailable to us, or cases have values outside the measurement range of our instruments.
Here, I want to label cases as "observed" or "unobserved" based on the rank score of a numeric variable.
My attempts so far confuse tables with element values, but I don't know what to try next. I'm sure it will be head-smacking simple when I see some suggestions
## generate some data
n_rows <- 20
x <- rnorm(n_rows)
status <- rep("unobserved", n_rows)
data <- data.frame(x, status)
library(dplyr)
## how many observed cases?
n_observed <- 5
## Failure #1
data$status[data$x == dplyr::top_n(data$x, n_observed)] <- "observed"
#> Error in UseMethod("tbl_vars"): no applicable method for 'tbl_vars' applied to an object of class "c('double', 'numeric')"
## Failure #2
data$status[which((data$x == dplyr::top_n(data, x, n_observed)))] <- "observed"
#> Warning in if (n > 0) {: the condition has length > 1 and only the first element will be used
## Failure #3
data$status[top_n(data, x, n_observed) %in% data] <- "observed"
#> Warning in if (n > 0) {: the condition has length > 1 and only the first element will be used
data$status[rank(data$x) <= 5] <- "observed"
ordata$status[rank(-data$x) <= 5] <- "observed"
(depending on your desired ordering)? – Mikael Jagan