0
votes

I have a table of senders and receivers:

Table A:

+------+--------+                                                          
|sender|receiver|
+------+--------+ 
|     A|       B|
|     B|       C|
|     C|       D|
|     D|       A|
|     E|       B|
|     A|       D|
|     E|       C|
+------+--------+  

I want to create a graph with connections from 'sender' to 'receiver'. For example, A -(rel)-> B

The values in the sender column can also appear in the receiver column. However, I do not want the nodes A,B,C,D, and E to be repeated in my graph. They must only appear once in the final graph.

I tried doing this:

LOAD CSV WITH HEADERS FROM 'file:///graph' AS row
CREATE (:sender_node { sender: row.sender})

MATCH (n:sender_node)
WITH n.sender AS sender, collect(n) AS nodes
WHERE size(nodes) > 1
FOREACH (n in tail(nodes) | DELETE n)

LOAD CSV WITH HEADERS FROM 'file:///graph' AS row
CREATE (:receiver_node { receiver: row.receiver})

MATCH (n:receiver_node)
WITH n.receiver AS receiver, collect(n) AS nodes
WHERE size(nodes) > 1
FOREACH (n in tail(nodes) | DELETE n)

LOAD CSV WITH HEADERS FROM 'file:///graph' AS row
MATCH (from_var:sender_node {sender: row.sender}),(to_var:receiver_node {receiver:row.receiver})
CREATE (from_var)-[:rel]->(to_var)
RETURN *

I basically deleted the duplicates separately in sender and receiver nodes. But since the nodes A, B, C, and D appear in both 'sender' and 'receiver' columns, they appear twice in the graph. I want to correct this part.

1

1 Answers

1
votes

The start node of an EMAILS relationship is implicitly a sender. And the end node of an EMAILS relationship is implicitly a receiver. So you should not have different labels for the sender and receiver node, which is redundant and makes your data model overly complex.

Instead, you should just use, say, the User label for both kinds of nodes. That will also help you to avoid duplicates, especially when you use the MERGE clause instead of CREATE.

For example:

LOAD CSV WITH HEADERS FROM 'file:///graph' AS row
MERGE (sender:User {id: row.sender})
MERGE (receiver:User {id: row.receiver})
MERGE (sender)-[:EMAILS]->(receiver)
RETURN *