Unable to apply ddply-summarise in R correctly

Question

new here and new to R, so bear with me, please.

I have a data.frame similar to this:

     time. variable     TEER
1    0.07    cntrl 234.2795
2    1.07    cntrl 602.8245
3    2.07    cntrl 703.6844
4    3.07    cntrl 699.4538
...
48   0.07    cntrl  234.2795
49   1.07    cntrl  602.8245
50   2.07    cntrl  703.6844
51   3.07    cntrl  699.4538
...
471  0.07  agr1111 251.9119
472  1.07  agr1111 480.1573
473  2.07  agr1111 629.3744
474  3.07  agr1111 676.6782
...
518  0.07  agr1111 251.9119
519  1.07  agr1111 480.1573
520  2.07  agr1111 629.3744
521  3.07  agr1111 676.6782
...
753  0.07  agr2222 350.1049
754  1.07  agr2222 306.6072
755  2.07  agr2222 346.0387
756  3.07  agr2222 447.0137
757  4.07  agr2222 530.2433
...
802  2.07  agr2222 346.0387
803  3.07  agr2222 447.0137
804  4.07  agr2222 530.2433
805  5.07  agr2222 591.2122

I'm trying to apply ddply() to this data frame to get a new data frame with means and standard error (to plot later) like so:

> ddply(data_melt, c("time.", "variable"), summarise,
mean = mean(TEER), sd = sd(TEER),
sem = sd(TEER)/sqrt(length(TEER)))

What I get as an output data frame are same values of TEER in the mean column as in the first rows of the original data frame and zeroes in sd and sem columns. Also an error:

Warning message:

In levels<-(*tmp*, value = if (nl == nL) as.character(labels) else paste0(labels, : duplicated levels in factors are deprecated

It looks like the function only goes through the first part of the data frame and doesn't bother looking at the duplicates of time. and variable group?

I already tried looking at the solutions to similar problems here but nothing seems to work. Am I missing something or is this a legitimate problem?

Any help / tips appreciated.

P.S Let me know if I'm not explaining the problem coherently enough and I'll try to go into more detail.

the second argument should be "variable" inside ddply() which represents the group-by variable. Also ensure that you have only this package loaded.. sumarize is also presnet in Hmisc and dplyr hence this caution — joel.wilson
Changed it to "variable". Error: nrow(labels) == length(null) is not TRUE came up in addition to that same warning message from before. — Vytautas Guzas
it should work.. ensure your session namespace is calling plyr::summarise itself — joel.wilson
Made sure that only the package plyr is loaded but it's still the same problem. Also, you are saying that the second argument should only be "variable", but I also want them to be grouped by time as well, hence the c("time.", "variable"). — Vytautas Guzas

Vytautas Guzas Vytautas Guzas · Accepted Answer · 2017-07-13T23:55:44

I think I've found a way around my problem.

Initially, when I load the data frame, each of the variables ("cntrl, "agr1111", "agr2222"), has a unique letter and number near them ("A1", "A2", "B1", "B2"), hence, looking like this: "cntrl.A1", "agr1111.B2". Instead, of substracting the letter-number from each of them using gsub i tried using filter with grepl to isolate certain rows that I need and summarise then. Here's the code:

library(dplyr)
dt_11 <- dt %>%
        group_by(time.) %>%
        filter(grepl("agr1111", variable)) %>%
        summarise(avg_11 = mean(teer), 
                  sd_11 = sd(teer),
                  sem_11 = sd(teer)/sqrt(length(teer)))

This only gives me a data frame with one group of variables ("agr1111") and I'll have to do this two more times, for "cntrl" and "agr2222", hence resulting in 3 data frames. But I'm sure, I'll be able to either merge the data frames or plot them on the same graph separately.

Unable to apply ddply-summarise in R correctly

2 Answers