2
votes

I am trying to convert long format wind data into wide format. Both wind speed and wind direction are listed within the Parameter.Name column. These values need to be cast by both Local.Site.Name, and Date.Local variables.

If there are multiple observations per unique Local.Site.Name + Date.Local row, then I want the mean value of those observations. The built-in argument 'fun.aggregate = mean' works just fine for wind speed, but mean wind direction cannot be computed this way because the values are in degrees. For example, the average of two wind directions near North (350, 10) would output as South (180). For example: ((350 + 10)/2 = 180), despite the polar average being 360 or 0.

The 'circular' package will allow us to compute the mean wind direction without having to perform any trigonometry, but I am having trouble trying to nest this additional function within the 'fun.aggregate' argument. I thought a simple else if statement would do the trick, but I am running into the following error:

Error in vaggregate(.value = value, .group = overall, .fun = fun.aggregate, : could not find function ".fun"
In addition: Warning messages:
1: In if (wind$Parameter.Name == "Wind Direction - Resultant") { :
    the condition has length > 1 and only the first element will be used
2: In if (wind$Parameter.Name == "Wind Speed - Resultant") { :
    the condition has length > 1 and only the first element will be used     
3: In mean.default(wind$"Wind Speed - Resultant") :
    argument is not numeric or logical: returning NA

The goal is to be able to use the fun.aggregate = mean for Wind Speed, but the mean(circular(Wind Direction, units = 'degrees') for Wind Direction.

Here's the original data (>100MB): https://drive.google.com/open?id=0By6o_bZ8CGwuUUhGdk9ONTgtT0E

Here's a subset of the data (1st 100 rows): https://drive.google.com/open?id=0By6o_bZ8CGwucVZGT0pBQlFzT2M

Here's my script:

library(reshape2)
library(dplyr)
library(circular)

#read in the long format data:
wind <- read.csv("<INSERT_FILE_PATH_HERE>", header = TRUE)

#cast into wide format:
wind.w <- dcast(wind, 
            Local.Site.Name + Date.Local ~ Parameter.Name,
            value.var = "Arithmetic.Mean", 
            fun.aggregate = (
              if (wind$Parameter.Name == "Wind Direction - Resultant") {
                mean(circular(wind$"Wind Direction - Resultant", units = 'degrees'))
              }
              else if (wind$Parameter.Name == "Wind Speed - Resultant") {
                mean(wind$"Wind Speed - Resultant")
              }),
            na.rm = TRUE)

Any help would be greatly appreciated!

-spacedSparking

EDIT: HERE'S THE SOLUTION:

library(reshape2)
library(SDMTools)
library(dplyr)
#read in the EPA wind data:
#This data is publicly accessible, and can be found here: https://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download_files.html    
wind <- read.csv("daily_WIND_2016.csv", sep = ',', header = TRUE, stringsAsFactors = FALSE)

#convert long format wind speed data by date and site id:
wind_speed <- dcast(wind, 
                    Local.Site.Name + Date.Local ~ Parameter.Name,
                    value.var = "Arithmetic.Mean",
                    fun.aggregate = function(x) {
                      mean(x, na.rm=TRUE)
                    },
                    subset = .(Parameter.Name == "Wind Speed - Resultant")
)

#convert long format wind direction data into wide format by date and local site id:
wind_direction <- dcast(wind, 
                        Local.Site.Name + Date.Local ~ Parameter.Name,
                        value.var = "Arithmetic.Mean",
                        fun.aggregate = function(x) {
                          if(length(x) > 0) 
                            circular.averaging(x, deg = TRUE)
                          else
                            -1
                        },
                        subset= .(Parameter.Name == "Wind Direction - Resultant")
)

#join the wide format split wind_speed and wind_direction dataframes
wind.w <- merge(wind_speed, wind_direction)
3
You should clip off the top of your data file to the first 100 lines or so and post that here. Making everyone who wants to answer your question download 106MB is liable to reduce the number of helpers.Richard
I made sure to trim the data down to 100 lines. Thanks for the suggestion I am new to stack!philiporlando
Thanks, that's much easier to work with, but have you verified that this small dataset still shows the problem you're trying to solve? Your goal on SO is to make as accessible as possible the resources available to understand and answer your question.Richard
That is a good point. I've tested the smaller dataset and it results in the same error message as the original data. Thanks!philiporlando
Alright, I can do that. Both solutions work, but I wanted to post a solution with as few lines of code as possible.philiporlando

3 Answers

0
votes

You're using wind.w inside of the code which defines wind.w - that's not going to work!

You're also using the angled quote marks (`) instead of the straight quote marks ('). The straight quote marks should be used to delineate a string.

0
votes

you can use subset in dcast to apply the two functions and get seperate dataframes then merge them

library(reshape2)
library(dplyr)
library(circular)

#cast into wide format:
wind_speed <- dcast(wind, 
                Local.Site.Name + Date.Local ~ Parameter.Name,
                value.var = "Arithmetic.Mean",
                fun.aggregate = function(x) {
                  mean(x, na.rm=TRUE)
                },
                subset=.(Parameter.Name == "Wind Speed - Resultant")
)

wind_direction <- dcast(wind, 
                    Local.Site.Name + Date.Local ~ Parameter.Name,
                    value.var = "Arithmetic.Mean",
                    fun.aggregate = function(x) {
                      if(length(x) > 0) 
                        mean(circular(c(x), units="degrees"), na.rm=TRUE)
                      else
                        -1
                    },
                    subset=.(Parameter.Name == "Wind Direction - Resultant")
)


wind.w <- merge(wind_speed, wind_direction)
0
votes

Alright thanks to all of your help I managed to solve this pesky wind direction problem. Sometimes solving problems is just a matter of knowing the right questions to ask. In my case, learning the term 'vector-averaging' was all I needed! There is a built-in vector-averaging function called circular.averaging() from the SDMTools package that averages wind direction and produces an output that is still between 0-359 degrees! What I ended up doing was appending tjjjohnson's script. I changed the fun.aggregate argument from mean(circular(c(x), units = "degrees"), na.rm = TRUE) to circular.averaging(x, deg = TRUE) Here are histograms of the raw and aggregated data! Everything is looking good, thanks everyone!