0
votes

I'm trying to some simple box plots, but have noted the points I've got in my dataframe are just plotting incorrectly in ggplot, inside all of the aforementioned types of plot.

My data is

structure(list(rownum = 1:74, Device = c("Dexcom", "Dexcom", 
"Dexcom", "Dexcom", "Dexcom", "Dexcom", "Dexcom", "Dexcom", "Dexcom", 
"Dexcom", "Dexcom", "Dexcom", "Dexcom", "Dexcom", "Dexcom", "Libreview", 
"Libreview", "Libreview", "Libreview", "Libreview", "Libreview", 
"Libreview", "Libreview", "Libreview", "Libreview", "Libreview", 
"Libreview", "Libreview", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend CGM", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend Manual", "Diasend Manual", 
"Diasend CGM", "Diasend CGM", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend CGM", "Diasend Manual", 
"Diasend Manual", "Diasend CGM", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend Manual", "Diasend CGM", 
"Diasend Manual", "Diasend CGM", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend CGM", "Diasend Manual", "Diasend CGM", 
"Diasend Manual", "Diasend Manual", "Diasend Manual", "Diasend Manual", 
"Diasend Manual", "Diasend Manual", "Diasend CGM", "Diasend Manual"
), PREMean = c(10.0484850182022, 7.84715557883709, 7.28766699205132, 
8.47011442894507, 10.7497970736388, 8.6565711351755, 12.2666572965045, 
12.8489327534292, 9.38152123552124, 9.82593283758822, 9.25191807020791, 
10.590004260355, 10.1991015796402, 8.11500023112837, 9.3887371146612, 
9.05289979902383, 16.3938994229184, 11.2269812823576, 8.46589333710567, 
9.45301483336544, 9.654521175124, 9.17169712793734, 5.90663637838715, 
15.1026720647773, 8.73502786461873, 12.515518913676, 10.2021609195402, 
8.88323924469535, 9.138, 10.5977853492334, 14.7827906976744, 
10.9643874643875, 8.04525252525253, 9.2234693877551, 9.2234693877551, 
13.4109826589595, 8.65916169339799, 9.07101449275362, 10.7026923076923, 
17.9097799511002, 6.05655339805825, 7.24913151364764, 7.84826142795985, 
11.6334796926454, 10.0795389048991, 9.63545878693624, 11.7388888888889, 
11.3917218543046, 8.11740335319385, 9.41461318051576, 12.9295681063123, 
10.2035994083164, 7.68975155279503, 10.249885583524, 5.79714285714286, 
10.0638826185102, 8.44704049844237, 10.6952513150205, 9.36492957746479, 
9.83008799318762, 9.6688654353562, 8.00041753653445, 9.26, 9.38389756944444, 
8.55568181818182, 8.63457241816674, 8.12372881355932, 9.84208494208494, 
11.28828125, 9.04013157894737, 11.6740659340659, 9.61797752808989, 
13.8315843798383, 10.1719101123596), POSTMean = c(8.19190208049315, 
7.61158509359437, 7.20120148352596, 8.57923580164976, 10.6268789167925, 
8.37193152150653, 12.3593220150292, 13.9380512091038, 9.30225121492054, 
8.19597861420017, 8.73307014253563, 8.23531795760565, 10.4691064145347, 
8.78835006435006, 9.48096681373489, 9.12521085925145, 13.1253985706432, 
10.2115876974231, 7.65094314018184, 11.1021567021567, 12.3527429320352, 
8.74159058145123, 6.82408707865169, 9.2207729468599, 8.33679846938776, 
11.2045885361817, 12.2492643845594, 8.41001977587343, 8.24191419141914, 
10.7707317073171, 12.2390334572491, 8.28022598870056, 7.67814207650273, 
9.48614130434783, 9.48614130434783, 11.0455128205128, 8.36162310181728, 
10.2825581395349, 10.1807407407407, 16.3283333333333, 7.56851851851852, 
6.80612244897959, 7.6510029661656, 12.1434984833165, 12.2157894736842, 
11.2797101449275, 19.1619047619048, 13.2472361809045, 8.87069342340552, 
8.40763888888889, 13.5286956521739, 10.4632632632633, 8.76877470355731, 
10.6271903323263, 8.2667701863354, 8.61640378548896, 6.96209386281588, 
8.29738799201886, 8.51794871794872, 8.10574666733237, 8.43217993079585, 
7.7244635193133, 13.9224137931034, 9.19426699426699, 8.15335753176044, 
8.30695218383485, 5.89611231101512, 9.45526315789474, 9.406875, 
9.78860759493671, 9.33200934579439, 9.406875, 11.2342145015106, 
11.2984126984127)), row.names = c(NA, -74L), na.action = structure(c(`19` = 19L, 
`30` = 30L, `38` = 38L, `39` = 39L, `42` = 42L, `44` = 44L, `51` = 51L, 
`62` = 62L, `79` = 79L, `84` = 84L), class = "omit"), class = c("tbl_df", 
"tbl", "data.frame"))

Then

ggplot(data, aes(x=PREMean, y=POSTMean)) + geom_point()

Plots some points that are clearly too low - less than 5. None of the numbers are less than 5.

Plotting with ggboxplot and ggpaired also gives me points that are far too low.

I'm tearing my hair out, I just don't understand why the points are clearly plotting incorrectly? Please help, thanks.

1
I think you may be misreading the scale. Try adding geom_hline(yintercept = 5) + geom_vline(xintercept = 5) to your plotRichard Telford

1 Answers

1
votes

As @RichardTelford states your plot is as expected.

I've added both plots to the answer to demonstrate the difference between ggplot's default axes scales and user defined scales.

ggplot does not know how you will interpret the axis: it just takes the minimum and maximum values for each axis and fits them to the space available and does the best job it can with labelling the tick marks. ggplot relies on the reader to workout, in the case of the default version using your data that the minor grid lines on the x axis represent 2.5, therefore the x origin is somewhat greater than 5.

If you want to be explicit about the axes values and breaks you will have to tell ggplot what to print. You have lots of flexibility: you can set limits, breaks and scale...

If you want a particular pair of limits and breaks for a series of graphs then you may be better off creating a function which does this for you; that's the topic of another question though; you could look at this answer, which sets the scales from 0 to the limits of the data: Setting y axis breaks in ggplot


library(ggplot2)
library(patchwork)

p1 <- ggplot(data, aes(x=PREMean, y=POSTMean)) +
  geom_point()+
  ggtitle("Default axis scales")


p2 <- ggplot(data, aes(x=PREMean, y=POSTMean)) +
  geom_point()+
  scale_x_continuous(limits = c(0,20))+
  scale_y_continuous(limits = c(0,20))+
  ggtitle("Defined axis scales")


p1/p2

Created on 2020-06-27 by the reprex package (v0.3.0)