I'm learning data.table and am trying to subset and re-order grouped data while calculating averages, all in one data.table statement.
I got the flight data from here while following along in this tutorial.
From the tutorial,
How can we get the average arrival and departure delay for each orig,dest pair for each month for carrier code “AA”?
ans <- flights[carrier == "AA",
.(mean(arr_delay), mean(dep_delay)),
by = .(origin, dest, month)]
ans
# origin dest month V1 V2
# 1: JFK LAX 1 6.590361 14.2289157
# 2: LGA PBI 1 -7.758621 0.3103448
# 3: EWR LAX 1 1.366667 7.5000000
# 4: JFK MIA 1 15.720670 18.7430168
# 5: JFK SEA 1 14.357143 30.7500000
# ---
# 196: LGA MIA 10 -6.251799 -1.4208633
# 197: JFK MIA 10 -1.880184 6.6774194
# 198: EWR PHX 10 -3.032258 -4.2903226
# 199: JFK MCO 10 -10.048387 -1.6129032
# 200: JFK DCA 10 16.483871 15.5161290
This makes sense to me. However, I want to improve on this by organizing the flights which departed from the same origin together. Following the schema of...
data.table(subset, select/compute, group by)
I came up with this:
flights[(carrier=="AA")[order(origin, dest)],
.(Arrival_Delay=mean(arr_delay), Depart_Delay=mean(dep_delay)),
by=.(origin, dest, month)]
But my call to order doesn't seem to have done anything. Why?
I can achieve this in dplyr with:
flights %>%
filter(carrier=="AA") %>%
group_by(origin, dest, month) %>%
summarize(Arrival_Delay=mean(arr_delay), Depart_Delay=mean(dep_delay)) %>%
as.data.table()
I'm curious how to do this in data.table and why my approach did not work.
[order()]call at the end of the chain:flights[(carrier=="AA"), .(Arrival_Delay=mean(arr_delay), Depart_Delay=mean(dep_delay)), by=.(origin, dest, month)][order(origin, dest)]- SymbolixAU