Let's say I have the following:
myseq <- seq(0, 1, by = 0.1)
scores <- sample(seq(0, 1, by = 0.01), 10)
var1 <- sample(c(0,1), 10, replace = T)
var2 <- sample(c(0,1), 10, replace = T)
mydf <- data.frame(scores = scores, var1 = var1, var2 = var2)
myseq
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
mydf
scores var1 var2
1 0.10 1 0
2 0.06 1 0
3 0.74 0 0
4 0.15 1 0
5 0.40 1 1
6 0.96 0 0
7 0.04 1 0
8 0.71 0 1
9 0.94 1 1
10 0.38 0 0
For each value in myseq
, I want to sum var1
and var2
for the subset of records where scores
is greater than the value in myseq
.
I want to do this only using the apply-family functions (apply, lapply, tapply, sapply, mapply, etc.). In other words, no nested for-loops.
So, for example:
The first value in myseq
is 0.0
. All scores
are greater than 0.0
, so I want to return var1
= 6
and var2
= 3
.
The second value in myseq
is 0.1
. Only 7 of the 10 scores
are greater than 0.1
, so I want to return var1
= 3
and var2
= 3
.
...so on and so forth...
In the end, I'd like to the final output to be a 11(r) x 2(c) matrix (or data frame or list) containing the sums for each var.
var1 var2
6 3
3 3
...
...
Note: 11(r) is because the length of myseq
is 11; 2(c) is because there are two vars, var1
and var2
set.seed
when generating data frames via functions such asrnorm
orsample
so we can double check our results with your expected output. – Sotos