3
votes

Am very new to R and am NOT an experienced programmer. I am having an issue with ggplot using the geom_area to create a stacked chart for wind directions. I want to ensure that I stack from bottom to top in the order N, NE, E, SE, S, SW, W, NW

I have succedded in getting the labels ordered but the issue is that the colours no longer relate to the data on the chart. Below are the various things I tried and the resulting graphs.

The data.frame comes from a different program but a small subset is as follows for 3 days: The final column is for a solution I found but is VERY clunky, however what concerns me more is the fact that the labels no longer relate to the data in ggplot and I'm wondering where I went wrong.

My data.frame is as follows and it is called knime.in:

         Day of year WD Binned Count(Time) WD Binned Number
    Row0          119         E         324                3
    Row1          119         N          32                1
    Row2          119        NE         240                2
    Row3          119        NW         149                8
    Row4          119         S          65                5
    Row5          119        SE          94                4
    Row6          119        SW         209                6
    Row7          119         W         279                7
    Row8          120         E         435                3
    Row9          120         N          68                1
    Row10         120        NE         112                2
    Row11         120        NW          46                8
    Row12         120         S          15                5
    Row13         120        SE         130                4
    Row14         120        SW          52                6
    Row15         120         W         588                7
    Row16         121         E         114                3
    Row17         121         N          34                1
    Row18         121        NE           6                2
    Row19         121        NW         282                8
    Row20         121         S          55                5
    Row21         121        SE         101                4
    Row22         121        SW         194                6
    Row23         121         W         594                7

First attempt using factor:

require (ggplot2)

knime.in$"WD Binned" <- factor(knime.in$"WD Binned", levels = c("N","NE","E","SE","S","SW","W","NW"))

ggplot(knime.in, aes(x = knime.in$"Day of year", y = (knime.in$"Count(Time)"-1), fill = knime.in$"WD Binned")) +  geom_area(stat="identity")+ scale_fill_brewer(palette="BrBG")

Second attempt was using levels:

require (ggplot2)

levels(knime.in$"WD Binned") <- c("N","NE","E","SE","S","SW","W","NW")

ggplot(knime.in, aes(x = knime.in$"Day of year", y = (knime.in$"Count(Time)"-1), fill = knime.in$"WD Binned")) +  geom_area(stat="identity")+ scale_fill_brewer(palette="BrBG")

For reference without anything:

require (ggplot2)

ggplot(knime.in, aes(x = knime.in$"Day of year", y = (knime.in$"Count(Time)"-1), fill = knime.in$"WD Binned")) +  geom_area(stat="identity")+ scale_fill_brewer(palette="BrBG")

and finally the kludge which worked, by ordering on a numeric column i had to create elsewhere (as I couldn't work out ho to order on a user-defined order).

require (ggplot2)

dt <- knime.in[order(knime.in$"WD Binned Number"),] #order the data so that it will be stacked correctly

dt$"WD Binned" <- factor(dt$"WD Binned", levels = c("N","NE","E","SE","S","SW","W","NW")) ggplot(dt, aes(x = dt$"Day of year", y = (dt$"Count(Time)"-1)/1440, fill = dt$"WD Binned")) + geom_area(stat="identity")+ scale_fill_brewer(palette="BrBG")

Taking day 120 as an example. From the data we should have:

N  = 68
NE = 112
E  = 435
SE = 130
S  = 15
SW = 52
W  = 588
NW = 46

If we look at the charts:

enter image description here Attempt 1 = Chart Text labels in the correct order, stacking in "alphabetical" order, colours relate to labels (so only issue here is that stacking is not in the order I want)

enter image description here Attempt 2 = Chart Text labels in teh correct order, stacking in "alphabetical" order relating to the REAL data BUT colours are stacked in the correct order but data is wrong in relation to the colour eg N is dark Brown on legend but dark brown on graph is in fact data for East

enter image description here Attempt 3 (above) = Data and colours and labels are all in sync BUT not in the order I want

enter image description here Final working (above) = As I wanted all along, stacking from N at the bottom, colours of legend and labels of legend relate to the correct data elements on the chart

Many thanks

Peter

1
Start by replacing 'knime.in$"nameofvariable' in aes with 'nameofvariable' (unquoted).Henrik
Thanks Henrik, the quotes happen to be how Knime passes the variables to the R code. If I remove the quotes, R doesn't recognise the variables at all. The actual code works but with unintended affects. Sadly until i get 10 credits, I cannot post the images.piashaw
You can use backticks in aes to use variables names that are not syntactically valid - ggplot(knime.in, aes(x = `Day of year`, y = `Count(Time)`-1, fill = `WD Binned`))...aosmith
I think my issue with this (as aosmith's suggestion didn't work either) is that I am actually running R script from within a node of another program which sends the data out to the Rserver etc. Quite possibly this gets slightly confused with variable names in their more simplistic form. However, other than looking nicer and easier to read, what difference (if any) would it have made as all my scripts worked (just not quite as expected)piashaw
@piashaw Now i see where you went wrong. With levels(knime.in$"WD Binned") <- c("N","NE","E","SE","S","SW","W","NW") you are resetting the factor levels. Look at your dataset before doing this and after doing this. You will see that where there was first E for the first observation, there is now N. The correct way for setting the order of factor levels is as I showed in my answer and as you also used in your first attempt.Jaap

1 Answers

2
votes

As @Henrik said, you should name your variables appropriately. You can solve this as follows:

# reading the data (with appropriately named variables)
knime.in <- structure(list(Day.of.year = c(119L, 119L, 119L, 119L, 119L, 119L, 119L, 119L, 120L, 120L, 120L, 120L, 120L, 120L, 120L, 120L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L),
                           WD.Binned = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), .Label = c("E", "N", "NE", "NW", "S", "SE", "SW", "W"), class = "factor"),
                           Count = c(324L, 32L, 240L, 149L, 65L, 94L, 209L, 279L, 435L, 68L, 112L, 46L, 15L, 130L, 52L, 588L, 114L, 34L, 6L, 282L, 55L, 101L, 194L, 594L)), .Names = c("Day.of.year", "WD.Binned", "Count"),
                      class = "data.frame", row.names = c(NA, -24L))

# rearranging the factor levels
knime.in$WD.Binned <- factor(knime.in$WD.Binned, levels = c("N","NE","E","SE","S","SW","W","NW"))

# loading required packages
library(ggplot2)
library(dplyr)

# rearranging the data with dplyr
knime.in <- knime.in %>% group_by(Day.of.year) %>% arrange(WD.Binned)

# rearranging the data in base R
knime.in <- knime.in[order(knime.in$WD.Binned),]

# creating the area plot    
ggplot(knime.in, aes(x = Day.of.year, y = (Count-1), fill = WD.Binned)) +
  geom_area(stat="identity") + 
  scale_x_continuous("\nDay of the year", expand=c(0,0), breaks=c(119,120,121)) +
  scale_y_continuous("Count", expand=c(0,0), breaks=c(250,500,750,1000,1250)) +
  scale_fill_brewer(palette="BrBG") +
  theme_classic()

which gives: enter image description here


Answer on your comment:

When you read the data with knime.in <- structure(...code...) and plot, you get the following result: enter image description here

Now, have a look at the levels of WD.Binned with levels(knime.in$WD.Binned). As you can see they are in the same order as the legend. Now, also look at your dataframe (with View(knime.in)) and you will see that the order of the rows is also the same as the legend. Which shouldn't surprise you as the levels are presented in the order in which they occur in your dataset.

When you change the order of the levels with knime.in$WD.Binned <- factor(knime.in$WD.Binned, levels=c("N","NE","E","SE","S","SW","W","NW")), you only change the order of the levels, but you do not change the order of the data. When you then create a plot, you see that the data is plotted in the order in which it is stored in your dataframe: enter image description here

Therefore you also have to reorder your data. That is done with: knime.in <- knime.in[order(knime.in$WD.Binned),] (or the dplyr equivalent). Now you can get the plot where the levels are plotted in the right order as I showed in the first plot of this answer.