0
votes

I try to rephrase my question. I have the following data frame, bb1, and I'm decasting it using dcast from dply. In this example I want to calculate how much obersvations I have for in the "rt" column for each "subject" by "condition" (subject ~ condition), but I want only observations that have a "z.score" that meets a cretain condition. In the bellow example I used abs(z.score) > 1.5, but sometimes it will be 1.5, sometimes 1 and sometimes 2. The 1.5 is just an example. Also in the example bellow I calculate the length, but I would like also to be able to calculate the mean (e.g., the mean for "rt" column for each "subject" by "condition" only for observations that have "z.score" > 1.5, so the length is just an example here).

require(reshape2)
require(dplyr)
bb1 = data.frame(subject=c(99,99,99,99,99,11,11,11), rt=c(100,150,2,4,10,15,1,2),  ac=rep(1,8),
condition=c(1,1,2,4,3,3,4,4), z.score=c(0.2,0.3,0.2,0.3,0.3,0.2,0.2,0.2))

> bb1
#     subject  rt ac condition z.score
# 1      99 100  1         1     0.2
# 2      99 150  1         1     0.3
# 3      99   2  1         2     0.2
# 4      99   4  1         4     0.3
# 5      99  10  1         3     0.3
# 6      11  15  1         3     0.2
# 7      11   1  1         4     0.2
# 8      11   2  1         4     0.2

bb1 %>% 
  group_by(subject, condition) %>% 
  summarise(n = length(rt[abs(z.score) > 1.5])) %>% 
  dcast(subject ~ condition, value.var = "n")

#   subject  1  2 3 4
# 1      11 NA NA 0 0
# 2      99  0  0 0 0

My question is, how should I use the dcast part if I want to calculate the value.var = "n" for each subject? and not for each subject by condition? I want get the value.var for each subject across condition. This acutally means that I want to calculate the margins for each row. But I don't want to get the value.var for subject ~ condition, I want to get only the margins (i.e., to get the value.var for each subject across condition) and save it as a data.frame. In bb1 above I would like to get something like this

#   subject rt
# 1 11  0
# 2 99  0

Becaude both subjects (i.e., subject 11 and subject 99) don't have observations in any of the conditions that meet the z.score restriction, I need to get 0 for both.

I hope my question is better now

Any help will be greatly appreciated. Thank you, Ayala

2
You might want to work on your example... all your z.scores are 0.2 or 0.3, but your condition on them to be counted is > 1.5. You could simplify your summarise as-is with n = sum(abs(z.score) > 1.5). But given your 1.5 threshold, your output is correct.Gregor Thomas
Could you add an example of what you want the output to look like? Possibly you want to add row margins (add margins = "condition" to dcast) or maybe you want dcast(subject ~ ., value.var = "n")?aosmith
@aosmith I rephrased my question. I tried dcast(subject~ ., value.var = "n"), but them I get 2 and 4 (for subject 11 and 99, respectively), and I should get 0 for both because across conditions they don't have observations that meet my z.score restrictionayalaall

2 Answers

1
votes

It looks like you just want to make a summary dataset for each subject, showing the number of times each subject meets your z.score condition. Using dplyr (one of many options for group summaries):

bb1 %>% group_by(subject) %>% 
    summarise(rt = sum(abs(z.score) > 1.5)) 

Source: local data frame [2 x 2]

  subject rt
1      11  0
2      99  0

If you really want to use dcast for this, just change your aggregation function from the default length to sum. Notice you can name the new column by putting the desired name in quotes on the right-hand side of the tilde (~) in dcast.

bb1 %>% group_by(subject, condition) %>% 
    summarise(n = sum(abs(z.score) > 1.5))  %>% 
    dcast(subject ~ "rt", value.var = "n", fun = sum)

  subject rt
1      11  0
2      99  0
0
votes

So you just want to know how many measurements you have for each subject? I'm not sure I'm understanding what you're looking for, but if I've got it straight what you want, then I'd just use plyr like this:

 library(plyr)
 ddply(bb1, c("subject"), function(x) nrow(x))

EDIT: I like beginneR's answer, slightly amended. If you want to count the number of subjects with a z.score above some value (I don't see any above 1.5 so I'm using 0.2 as an example), here's one way:

 count(bb1[bb1$z.score > 0.2, ], "subject")