I have a dataframe of below format:
ORIGIN UNIQUE_CARRIER DEST avg_arr_delay
JFK AA LAX 14.56040268
JFK B6 LAX 7.085201794
JFK DL LAX 6.475770925
JFK VX LAX 3.939759036
LAX AA MCI 2.3
LOG DL SEA 3.4
I have two nodes created in graph database for origin airport and destination airport. It is related by a property named 'delayed_by' which points to average arrival delay. The relationship between the nodes are provided below:
MERGE (origin:origin_airport {name: row.ORIGIN})
MERGE (destination:dest_airport {name: row.DEST})
MERGE (carrier:Carrier {name: row.UNIQUE_CARRIER})
MERGE (origin)-[r:delayed_by]->(destination)
SET r.arr_delay=row.avg_arr_delay
I am trying to find average delay for specific origin and destination airport. Here, trying to find between JFK and LAX. JFK to LAX has 4 arr_delay value and I need to find the average of these values. I am using the below cypher query for finding average:
MATCH (oa:origin_airport {name:'JFK'})-[r:delayed_by]->(da:dest_airport
{name:'LAX'})
RETURN oa.name AS Origin,
AVG(toFloat(r.arr_delay)) As Arrdelay,
da.name AS Destination
Instead of average, it is taking the last value 3.939759036 and returning that value. But I am expecting below value.
ORIGIN DEST Average
JFK LAX 8.0152
Also, what would be the best way to create a relationship between unique_carrier, origin and destination, linked by delayed_by relationship. By doing this I need to select a particular airlines, say AA between JFK and LAX and find the average delay. Something like below works?
MERGE (origin:origin_airport {name: row.ORIGIN})
MERGE (destination:dest_airport {name: row.DEST})
**MERGE (carrier:Carrier {name: row.UNIQUE_CARRIER})**
CREATE (origin)-[r:delayed_by]->(destination)
**CREATE (origin)-[:from]->(carrier)-[r1:delayed_by]->(destination)**
SET r.arr_delay=row.avg_arr_delay
**SET r1.arr_delay=row.avg_arr_delay**
Could someone help me in achieving this output?