0
votes

I'm working with some wind direction data for a potential paper. I am trying to compare the number of days the wind is blowing easterly (negative U) and the number of days it is blowing westerly (positive U). I need to calculate this over an austral summer, so the period between October and March e.g.: October 1993 to March 1994.

Here is a sample of my data frame:

 Year Month Day Hour Minutes Seconds       Ws         U         V
1  1993     1   1    0       0       0 3.750620  2.822403  1.281318
2  1993     1   1    6       0       0 4.207054  3.600465  1.719147
3  1993     1   1   12       0       0 5.050543  3.155271  3.243411
4  1993     1   1   18       0       0 3.165194 -0.477054  2.926124
5  1993     1   2    0       0       0 1.529690 -0.721395 -0.503101
6  1993     1   2    6       0       0 1.950233  0.303333 -1.728295
7  1993     1   2   12       0       0 4.548992 -2.868217  3.307519
8  1993     1   2   18       0       0 6.563643 -6.245194  1.744419
9  1993     1   3    0       0       0 5.868992 -5.805969 -0.594031
10 1993     1   3    6       0       0 6.530620 -6.446667 -0.689535
11 1993     1   3   12       0       0 7.085736 -6.657984  1.834884
12 1993     1   3   18       0       0 7.685349 -7.111008  2.571783
13 1993     1   4    0       0       0 6.508760 -6.414574 -0.678837
14 1993     1   4    6       0       0 6.141860 -6.006822 -0.272558
15 1993     1   4   12       0       0 7.388295 -6.744574  1.862868
16 1993     1   4   18       0       0 7.281163 -7.054264  0.896512
17 1993     1   5    0       0       0 4.847287 -4.431628 -0.813643
18 1993     1   5    6       0       0 3.482558 -1.670078  2.048915
19 1993     1   5   12       0       0 5.698992  1.097287  5.433721
20 1993     1   5   18       0       0 4.894031  1.445736  4.440465
21 1993     1   6    0       0       0 1.983411  0.783023  1.556047
22 1993     1   6    6       0       0 2.315891 -1.225891  1.756744
23 1993     1   6   12       0       0 4.525581 -4.016124  1.723721
24 1993     1   6   18       0       0 5.123566 -4.618682  0.759225
25 1993     1   7    0       0       0 3.449147 -2.639457 -1.627442
26 1993     1   7    6       0       0 2.067364  1.185891 -0.760233
27 1993     1   7   12       0       0 5.675814  3.872171  3.419690
28 1993     1   7   18       0       0 6.278450  3.989767  4.684031
29 1993     1   8    0       0       0 6.562636  5.496667  3.329302
30 1993     1   8    6       0       0 7.762636  5.280310  5.516589
31 1993     1   8   12       0       0 9.283953  5.575659  7.294264
> 

So far I have manage to do this calculation for one month only (see code below), but I'm unsure of how to do it from October of one year to March of the next year. When I tried filter(wind,Year==1993:1994,Month==10:3,U>0) I got the error Warning message:

In Month == 10:3 : longer object length is not a multiple of shorter object length

This is what I have done so far with calculating the number of positive and negative directions for October 1993, which has worked. I am new to R and stackoverflow, so I hope I have set this out correctly!

filter(wind,Year==1993,Month==10,U>0)
Oct_1993_pos<-filter(wind,Year==1993,Month==10,U>0)
Oct_1993_pos

filter(wind,Year==1993,Month==10,U<0)
Oct_1993_neg<-filter(wind,Year==1993,Month==10,U<0)
Oct_1993_neg

sum(Oct_1993_pos$U>0)
sum(Oct_1993_neg$U<0)
1
Please add where the filter function is coming from. Also provide a sample of your data with dput(head(df,n)).NelsonGon
See group_by and summarize form dplyrA. Suliman
@NelsonGon the filter function comes from dplyr. I will provide a sample of my data as soon as I figure out how to - apologies.Clara Steyn
@NelsonGon I managed to add a sample of my data, I hope that is helpful.Clara Steyn

1 Answers

1
votes

Your first error (Month == 10:3) occurs because you are comparing a vector (Month) with another vector. When you do this, you do an element-wise comparison, i.e. Month[1] == 10, Month[2] == 9, etc. When the vectors are of unequal length, R repeats the shorter one - but only if the longer one is an exact number of multiples longer:

c(1,2,3,1,2,3) == c(1,2)
[1]  TRUE  TRUE FALSE FALSE FALSE FALSE
c(1,2,3,1,2) == c(1,2)
[1]  TRUE  TRUE FALSE FALSE FALSE
Warning message:
In c(1, 2, 3, 1, 2) == c(1, 2) :
  longer object length is not a multiple of shorter object length

For counting positive and negative U's, you can exploit that summing logicals simply counts the number of TRUEs:

sum(c(FALSE, TRUE, TRUE, FALSE))
[1] 2

And you can obtain such logicals by doing a simply comparison:

sum(U > 0)

For your calculations I would recommend using dplyr. With this you can repeat your counting across any collection of subsets. Try:

# if following fails, run install.packages("dplyr")
library(dplyr)
monthly <- wind %>% group_by(Year, Month) %>%
  summarise(
    pos=sum(U > 0), 
    neg=sum(U < 0), 
    nowind=sum(U == 0), 
    entries=n()
  )

Edit in response to comment:

Depending on if you need intermediate results or not, we could do a couple of things. Regarding the period October to March, you have to be careful if your data spans several years.

monthly %>% filter((Month => 10 & Year == 1993) | (Month <= 3 & Year == 1994)) %>% ungroup %>%
  summarise_at(vars(pos, neg, nowind, entries), sum)

or, just filter before you summarise:

wind %>% filter((Month => 10 & Year == 1993) | (Month <= 3 & Year == 1994)) %>%
  summarise(
    pos=sum(U > 0), 
    neg=sum(U < 0), 
    nowind=sum(U == 0), 
    entries=n()
  )

Beware here that I am using single boolean operators (|, &) and not double (||, &&) as we want to keep the element-wise comparisons (the double-variant collapses to a single element).

If you want to see winter vs. summer periods, across multiple years, we have to figure how to group the seasons correctly. For this, I will build a data set of years and months:

library(tidyr)
seasons <- crossing(month=1:12, year=1992:1994) %>% arrange(year, month) %>%
  mutate(
    season_start = month %in% c(3, 10),
    season = cumsum(season_start)
  )

With this approach, we've split the problem in two: 1) Define the seasons you wish to summarise over, and 2) summarise it.

inner_join(wind, seasons, by=c('Year'='year','Month'='month')) %>%
  group_by(season) %>%
  summarise(
    seasonstart = paste0(min(Year), '-', min(Month)),
    pos=sum(U > 0), 
    neg=sum(U < 0), 
    nowind=sum(U == 0), 
    entries=n()
  )

So, to summarise over the period October-March, same as before, just define a different grouping.

For exercises, try adding Year and/or Month to the group_by call in the last example.