I am trying to figure out the best way to group my data into factors to use for creating an MDS plot. I have annual discharge data that I would like to break into "high", "medium" and "low" flow years (or just high and low if a medium can't be determined). My issue is that I don't know the best way to do this. I can look at plots and pick the high, med, low, but I want a statistically sound way of doing this rather than just eyeballing it.
I have this data
structure(list(Zone = c("B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B"), Year = c(2008L, 2009L, 2010L, 2011L, 2012L,
2013L, 2014L, 2015L, 2016L, 2017L, 2018L, 2008L, 2009L, 2010L,
2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L,
2018L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2015L,
2016L, 2017L, 2018L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L,
2014L, 2015L, 2016L, 2017L, 2018L, 2008L, 2009L, 2010L, 2011L,
2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L
), Month = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L), Discharge = c(10, 77.8, 273.6, 29.3,
1.5, 86.1, 483.4, 84.8, 53.4, 50.3, 131.3, 3.79, 67.8, 97.1,
10.7, 144.4, 62.6, 171.6, 29.2, 36.2, 179.6, 383.5, 2.39, 32.4,
75.2, 7.85, 436.1, 630.1, 57.2, 27.7, 25.8, 73.7, 314.5, 175.7,
22.8, 68, 4.82, 329.9, 573.8, 34.5, 55.1, 40.1, 150.3, 416.7,
267.3, 37, 65.3, 5.41, 533.9, 255.2, 69.3, 36.8, 158.8, 95.7,
271.1, 45.3, 19.9, 74.3, 2.15, 222.2, 59.5, 36, 29.5, 38.7, 56.4,
65.2, 28.4, 17.2, 19.5, 1.31, 49, 30.8, 29.6, 39.5, 19.5, 35.8,
123.8), Date = c("2008-05", "2009-05", "2010-05", "2011-05",
"2012-05", "2013-05", "2014-05", "2015-05", "2016-05", "2017-05",
"2018-05", "2008-06", "2009-06", "2010-06", "2011-06", "2012-06",
"2013-06", "2014-06", "2015-06", "2016-06", "2017-06", "2018-06",
"2008-07", "2009-07", "2010-07", "2011-07", "2012-07", "2013-07",
"2014-07", "2015-07", "2016-07", "2017-07", "2018-07", "2008-08",
"2009-08", "2010-08", "2011-08", "2012-08", "2013-08", "2014-08",
"2015-08", "2016-08", "2017-08", "2018-08", "2008-09", "2009-09",
"2010-09", "2011-09", "2012-09", "2013-09", "2014-09", "2015-09",
"2016-09", "2017-09", "2018-09", "2008-10", "2009-10", "2010-10",
"2011-10", "2012-10", "2013-10", "2014-10", "2015-10", "2016-10",
"2017-10", "2018-10", "2008-11", "2009-11", "2010-11", "2011-11",
"2012-11", "2013-11", "2014-11", "2015-11", "2016-11", "2017-11",
"2018-11")), row.names = c(228L, 229L, 230L, 231L, 232L, 233L,
234L, 235L, 236L, 237L, 238L, 242L, 243L, 244L, 245L, 246L, 247L,
248L, 249L, 250L, 251L, 252L, 256L, 257L, 258L, 259L, 260L, 261L,
262L, 263L, 264L, 265L, 266L, 270L, 271L, 272L, 273L, 274L, 275L,
276L, 277L, 278L, 279L, 280L, 284L, 285L, 286L, 287L, 288L, 289L,
290L, 291L, 292L, 293L, 294L, 298L, 299L, 300L, 301L, 302L, 303L,
304L, 305L, 306L, 307L, 308L, 312L, 313L, 314L, 315L, 316L, 317L,
318L, 319L, 320L, 321L, 322L), class = "data.frame")
I ran a pairwise comparison using Wilcoxon rank sum test to determine significant differences between discharge rates for each year and got this:
pairwise.wilcox.test( B_Dis_data$Discharge, B_Dis_data$Year, p.adjust.method = "BH")
Pairwise comparisons using Wilcoxon rank sum test
data: B_Dis_data$Discharge and B_Dis_data$Year
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
2009 0.868 - - - - - - - - -
2010 0.328 0.191 - - - - - - - -
2011 0.283 0.020 0.011 - - - - - - -
2012 0.235 0.133 0.366 0.069 - - - - - -
2013 0.167 0.110 0.781 0.008 1.000 - - - - -
2014 0.366 0.283 0.710 0.008 0.437 0.366 - - - -
2015 0.710 0.654 0.191 0.014 0.133 0.069 0.328 - - -
2016 0.781 0.654 0.208 0.014 0.167 0.110 0.514 1.000 - -
2017 0.328 0.110 0.954 0.008 0.366 0.583 0.710 0.110 0.191 -
2018 0.069 0.014 0.133 0.008 1.000 0.583 0.191 0.011 0.020 0.110
P value adjustment method: BH
Any advice would be greatly appreciated!!
cut()
? – John Harley