0
votes

I have a data frame containing one row per observation and a field indicating in which sample this observation was made. Samples are from 1 to 20. Some samples have non and others have multiple observations. My objective is to count observation per sample - which I did using plyr and the count function. However, samples which have no observations are not included in the output (as they are not found in the observation data frame). My thoughts were to tally number occurrences in the obs. data.frame against a number vector R (seq(1:20).

What I have got is:

library(plyr)  
id= c(1,1,1,4,4,5,6,6,8,8,10,15,15,17,18,21,21) 

these are the sample ids with observations. Samples go from 1:20

obs=sample(seq(5, 50, by=3),size=17,replace=TRUE)  
df = data.frame(id,obs)   
out<-count(df$id) 

out only included samples with observations. samples 2,3,7,9,11,12,13,14,16,17,19,20 all had 0 obs. I want these to be included as such in the output.

1

1 Answers

0
votes

One option is to make the sample id a factor, and then use table() to get your count, like so:

id= c(1,1,1,4,4,5,6,6,8,8,10,15,15,17,18,21,21) 
obs=sample(seq(5, 50, by=3),size=17,replace=TRUE)  
df = data.frame(id,obs)    

df$id<-factor(df$id, levels=c(1:21))

out<-table(df$id)  
out

1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
3  0  0  2  1  2  0  2  0  1  0  0  0  0  2  0  1  1  0  0  2 

The levels argument in factor indicates all the levels, even those not found in the data. I am assuming you wanted id 21 included as well