0
votes

I am sort of puzzled with the outcome of the code here below. The data frame I called aux (the data) contains a factor and a quantitative variable. I want to plot mean values of the quantitative variable according to levels of the factor.

The code creates also a second data frame containing those grouped mean values.

Then there are two plots. The first one is fine by me: it plots the right values in two different ways, that is using stat_summary() on the original aux data frame or geom_point() on the aux.grouped data frame.

However, when I try to plot the log10 values of the quantitative variable, stat_summary() does not plot what I would have expected. I get that the use of log10 under aes on the ggplot mapping line may at the origin of this issue. What I do not get is what is stat_summary() plotting instead and why does not it plot, if it comes to an unmatched mapping issue, the non-log10 values instead.

Thanks a lot for your help.

Best,

David

aux <- read.table("aux.txt", header = TRUE, sep = "\t")

aux$nb.NAs <- factor(aux$nb.NAs)

aux.grouped <- aux %>% 
  group_by(nb.NAs) %>% 
  dplyr::summarise(mean_values = mean(values))

ggplot(aux, aes(x = nb.NAs, y = values, group = nb.NAs)) +
  stat_summary(geom = "point", fun = "mean", colour = "red", size = 10) +
  geom_point(data = aux.grouped, aes(x = nb.NAs, y = mean_values), colour = "blue", size = 5)
                        
ggplot(aux, aes(x = nb.NAs, y = log10(values), group = nb.NAs)) +
  stat_summary(geom = "point", fun = "mean", colour = "red", size = 5) +
  geom_point(data = aux.grouped, aes(x = nb.NAs, y = log10(mean_values)), colour = "blue", size = 5) 
1
Can you provide a sample of your dataset so we can re-run that code? You can do this with dput(data)Leo Ohyama
Hi Leo, Like this?David Rengel
structure(list(nb.NAs = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L), .Label = c("2", "3"), class = "factor"), values = c(5584949.80357048, 8014873.492117, 17206608.4238154, 1524223.86730749, 5882593.98508629, 19907181.0901551, 4945004.91561103, 20886241.7691373, 51093766.9511132, 6436423.4434915)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 12L, 16L), class = "data.frame")David Rengel
It is not the whole thing, but I guess it would do. The original aux dataset available at the link in my original post. Thanks.David Rengel

1 Answers

0
votes

I think this answers your question.

df<-structure(list(nb.NAs = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L),
                                      .Label = c("2", "3"), class = "factor"),
                   values = c(5584949.80357048, 8014873.492117, 17206608.4238154, 
                              1524223.86730749, 5882593.98508629, 19907181.0901551, 
                              4945004.91561103, 20886241.7691373, 51093766.9511132, 
                              6436423.4434915)), 
              row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 12L, 16L), class = "data.frame") 


df$nb.NAs <- factor(df$nb.NAs)

aux.grouped <- df %>% 
  group_by(nb.NAs) %>% 
  dplyr::summarise(mean_values = mean(values), mean_log10 = mean(log10(values)), 
                   log10_mean = log10(mean(values)))

When you run this code you'll see I calculated the log10 values two different ways, the first was getting the mean of the log10 values while the second is to get the log10values of the mean. Your second plot follows the latter (7.31 and 6.89). This is why you're getting differences between the red and blue points.You can match these values to your second plot and see the difference.

# A tibble: 2 x 4
  nb.NAs mean_values mean_log10 log10_mean
* <fct>        <dbl>      <dbl>      <dbl>
1 2        20531771.       7.19       7.31
2 3         7764603.       6.74       6.89