I am working on a project related to flight dataset. I have a dataframe of below format: It has flight number, carrier name, origin, destination, carrier delay, weather delay, nas delay, security delay and aircraft delay details in minutes.
FL_NUM CARRIER ORIGIN DEST carr_del weather_del nas_del sec_del aircraft_del
1 AA JFK LAX 0 0 0 0 0
1 AS DCA SEA 0 0 0 0 0
1 B6 JFK FLL 12 0 12 0 0
1 HA LAX HNL 405 0 5 0 0
1 VX SFO DCA 24 20 50 0 0
1 WN ATL MDW 0 0 0 0 0
1 WN DAL HOU 27 0 0 0 0
I have formed the relationship as below in Neo4j using cypher queries:
MERGE (origin:origin_airport {name: row.ORIGIN})
MERGE (destination:dest_airport {name: row.DEST})
MERGE (carrier:Carrier {name: row.UNIQUE_CARRIER})
MERGE (flight:Flight {name: row.FL_NUM})
MERGE (flight)-[:from {flnum: row.FL_NUM}]->(origin)
MERGE (flight)-[:to {flnum: row.FL_NUM}]->(destination)
MERGE (flight)-[:operated_by {carrier: row.UNIQUE_CARRIER}]->(carrier)
MERGE (origin)-[r:delayed_by]->(destination)
SET r.carr_delay=row.carr_delay, r.weather_delay=row.weather_delay,
r.nas_delay=row.nas_delay, r.sec_delay=row.sec_delay,
r.aircraft_delay=row.aircraft_delay
MERGE (flight)-[r1:delayed_by]->(origin)
SET r1.carr_delay=row.carr_delay, r1.weather_delay=row.weather_delay,
r1.nas_delay=row.nas_delay, r1.sec_delay=row.sec_delay,
r1.aircraft_delay=row.aircraft_delay
")
The relations are:
1) Flight number linked to origin airport(ORIGIN)
2) Flight number linked to destination airport(DEST)
3) Flight number linked to Unique carrier
4) Origin airport linked by delay to destination airport.
Delay parameter holds the value of carrier delay, weather delay, nas delay,
security and late aircraft delay
5) Flight linked by delay to origin airport
Here again, delay parameter holds the value of carrier delay, weather delay,
nas delay, security and late aircraft delay
Here I am looking to answer the question Top 10 carriers - leading type of delay.
I am using the below code to get top 10 carriers with respect to flight.
MATCH (f:Flight)-[:operated_by]->(c:Carrier)
WITH c, COUNT(f) AS flights
RETURN c.name,flights
ORDER BY flights DESC
LIMIT 10
I need to take it to the next step and calculate maximum delays associated with each carrier. Here, I have the delay values specified in minutes, my query needs to calculate which delay is having higher value and return the name of the delay for that particular carrier.
From the example, if you could notice for HA, carr_del has higher value and hence the output should be like:
Carrier Cause of delay
HA Carrier delay
VX nas delay
Is it possible to achieve using cypher queries in Neo4j? Or should I need to change the structure of relationship?.
If the above result is complex, is it possible to get top carriers with respect to any specific delay atleast, like say carrier delay?. Here carrier delay has the value for all the carriers, and it should return carriers based on top values. I know it starts somewhat like below, but not sure how to end.
MATCH (c)<-[:operated_by]-(:Flight)-[r1:DELAYED_BY]
Could someone help me?