2
votes

I have simple, large CSV file, with no headers, of the structure:

name1, name2 name3, name4 name2, name4 ...

I'm trying to import it all to Neo4J and create the relationships at the same time. First I've added the constraint CREATE CONSTRAINT ON (u:User) ASSERT u.name IS UNIQUE and then I ran:

USING PERIODIC COMMIT
LOAD CSV FROM '${file}' AS line
WITH line LIMIT 50000
MERGE (u:User {name: line[0]})-[:connected_to]->(q:User {name: line[1]})

The graph I get are just connected pairs. I cannot find a single node that has more than one relationship (even though many nodes appear many times in both the left and right columns). Also, I expected to see some clusters.

Clearly I'm doing something wrong with my insertion. I assume I can run down the file twice and create all nodes and then create all relationships, but I feel like I'm missing something simple that can do it all in one operation.

Correction: Had one of the property names as 'number' - they are both 'name'.

1
I think that you need to show an example of the input data on which the problem appears.stdob--
do you have a small snippet of data that can demonstrate the problem?Dave Bennett
@DaveBennett it's easy to create: just take 4 names and have one of them randomly repeat in one of the 2 columns to create pairs: john, jake/john,mike/david,john/amy,john etc. Even with this sample I end up with just connected pairs...Traveling Tech Guy
Fair enough. I created a small sample, used the query in my answer and it created a single john node with all of the others attached. I did add trim() to each name in the name matches so it would remove the leading spaces after the comma if indeed there are any.Dave Bennett
@DaveBennett Ok, now I tried with a small file, and your code worked like a charm! My guess is I have some sort of failure with the bigger file import. I'll try to do some log diving - but your answer works. Thanks for all your help!Traveling Tech Guy

1 Answers

2
votes

You need to create the entries first individually. MERGE will ensure the the entire pattern is created. As a result, you only get pairs matching each row of your file.

If you MERGE each name first in the line and then MERGE the relationship afterwards you will get the connected graph you desire. Note that the relationship MERGE is undirected. THis will ensure that only a single relationship is created between two particular nodes regardless of the order in the file or the number of occurrences.

USING PERIODIC COMMIT
LOAD CSV FROM '${file}' AS line
WITH line LIMIT 50000
MERGE (u:User {name: trim(line[0])} )
MERGE (q:User {name: trim(line[1])} )
MERGE (u)-[:connected_to]-(q)

If the data that contains entries similar to this where they repeat in different order and wanted to have relationships created in both directions then you could make the relationship MERGE directed

...
name1, name2
name2, name1
...

as in the following example

USING PERIODIC COMMIT
LOAD CSV FROM '${file}' AS line
WITH line LIMIT 50000
MERGE (u:User {name: trim(line[0])} )
MERGE (q:User {name: trim(line[1])} )
MERGE (u)-[:connected_to]->(q)