1
votes

I am planning to load train schedule and stations into Neo4j from CSV.

Source Data

TrainNo TrainName   SEQ   StationCode   Arrival Departure Distance
1           TN_1          1      S1           8      9       0
1           TN_1          2      S2           10     11     10
1           TN_1          3      S3           12     1      15 
1           TN_1          4      S4           3      4       15
2           TN_2          1      S1         
2           TN_2          2      S2         
2           TN_2          3      S5         
2           TN_2          4      S6         
2           TN_2          5      S7         
2           TN_2          6      S8         

I need to build nodes and relationship like this

S1--(TrainNo,TrainName,SEQ,Arrival,Depature,Distance)--S2--(TrainNo,TrainName,SEQ,Arrival,Depature,Distance)--S3--(TrainNo,TrainName,SEQ,Arrival,Depature,Distance)-S4

Basically, the TrainNo, TrainName,Seq, Arrival, Depature and Distance will be on the relationships, and the same relationships will form a route between the stations.

Neo4j - 3.5

1
what have you tried so far?jose_bacoy
I have loaded all the StationCode using MERGE, so that there is no Duplicate stationcode. Also in my past experience, I have loaded data like From and To columns. I haven;t loaded anything with sequencesuser3470294
This is a very odd CSV. Normally you would have something like a relationship CSV where each row represents a relationship you want to create (assuming all nodes are already created) so the CSV should have something like TrainNo, TrainName, SEQ, StationCodeFrom, StationCodeTo, Arrival, Departure, Distance. That way all rows are independent from each other. With your current structure, rows are dependent upon other rows, so for example by looking only at row 1 there is no way to tell what the To station is, it requires information from other rows. If you can please create a better CSV file.InverseFalcon

1 Answers

0
votes

You can sort and group by train and sequence (assuming the file is called train.csv)

LOAD CSV WITH HEADERS FROM "file:///train.csv" AS row
WITH row ORDER BY row.TrainNo, row.SEQ
WITH row.TrainNo AS TrainNo, collect(row) AS stations
UNWIND range(0, size(stations) - 2) AS idx
WITH TrainNo, stations[idx] AS start, stations[idx+1] AS end
MERGE (s1:Station {code:start.StationCode})
MERGE (s2:Station {code:end.StationCode})
// depends on your model (see below)
CREATE (s1)-[:ROUTE {train:TrainNo}]->(s2);

// alternative
LOAD CSV WITH HEADERS FROM "file:///train.csv" AS row
WITH row ORDER BY row.TrainNo, row.SEQ
MERGE (t:Train {trainNo: row.TrainNo})
WITH row.TrainNo AS TrainNo, collect(row) AS stations
UNWIND range(0, size(stations) - 2) AS idx
WITH TrainNo, stations[idx] AS begin, stations[idx+1] AS last
MERGE (s1:Station {code:begin.StationCode})
MERGE (s2:Station {code:last.StationCode})
CREATE (s1)-[:LEAVES]->(l:Leg)-[:ENTERS]->(s2)
CREATE (l)-[:OF_TRAIN]->(t)