0
votes

I am working on flights dataset from BTS and have my input file in below format (providing some sample data):

FL_NO   ORIGIN  CITY    DEP_DELAY   carr_delay  weather_delay   nas_delay   

4        DFW    Dallas    10            0           12             15
5        ATL    Georgia   0            50            0             5
6        LOG    Boston    20           35           10             50

Here I need to create nodes and relationships using cypher queries in Neo4j. From this input dataset, I need to form different sets of nodes.

I need to have three different nodes for origin airport. I need to analyse how the delays occur between these airports for a particular flights.

    CREATE (DFW:Airport {city:'Dallas'}), (ATL:Airport {city:'Atlanta'}),
    (LOG:Airport {city:'Boston'})

And nodes for flights,

CREATE (flight1:Flight {flight_number:4),
(flight2:Flight {flight_number:5),
(flight3:Flight {flight_number:6)

and then need to link flights to corresponding origin and delay.

(flight4)-[:ORIGIN {dep_delay:10}]->(DFW)

Similarly, I need to link carr_delay, weather_delay and nas_delay as well.

If we need to create all these nodes manually, we can achieve them, but I am not sure, how to have these kind of nodes created from an input dataset. Could someone help me in achieving this kind of nodes and relationships?

1

1 Answers

2
votes

Based on the data model you indicate and the sample data the LOAD CSV Cypher statements would look something like this:

First create a uniqueness constraint on Airport.code:

CREATE CONSTRAINT ON (a:Airport) ASSERT a.code IS UNIQUE;

Now iterate through the CSV file, creating Airport nodes for each airport:

LOAD CSV WITH HEADERS FROM "file:///flights.csv" AS row
MERGE (a:Airport {code: row.ORIGIN})
ON CREATE SET a.city = row.CITY;

Then iterate through the CSV file again, creating a Flight node for each flight, connecting it to the origin Airport node with an ORIGIN relationship and set the delay properties on the relationship:

LOAD CSV WITH HEADERS FROM "file:///flights.csv" AS row
CREATE (f:Flight {flight_number: row.FL_NO})
MATCH (a:Airport) WHERE a.code = row.ORIGIN
MERGE (a)<-[r:ORIGIN]-(f)
SET r.dep_delay = toInt(row.DEP_DELAY),
    r.carr_delay = toInt(row.carr_delay),
    r.weather_delay = toInt(row.weather_delay),
    r.nas_delay = toInt(row.nas_delay);