1
votes

I am creating nodes and relationships in Neo4j through a csv. I have three columns, 2 of these have null values. Any time I put 'WHERE NOT line.column_name IS null' it is not giving the graph for the other nodes that has been populated. It seems like is it not including that line at all where there is null. I am brand new to cypher and graph database concepts.

Sample csv data:

manager_name,contractor_name,company_name
"John Smith","Matt Potter",Google
"John Smith","Ann Potter",Google
"Andy Jacobs","Matt Potter",Apple
"Seth Jacobs","Mike Smith",
"Seth Jacobs",,LinkedIn

Query Using:

LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS line
WITH line WHERE NOT line.company_name IS NULL
WITH line WHERE NOT line.contractor_name IS NULL
MERGE (mn:manager_name {name:line.manager_name})
MERGE (cn:contractor_name {name:line.contractor_name})
MERGE (cpn:company_name {name:line.company_name})
MERGE (mn)-[:KNOWS]->(cn)
MERGE (cn)-[:AFFILIATED]->(cpn)

Output Question: Given the relationship that I have created, I think its obvious to me why it won't create nodes for "Seth Jacobs",,LinkedIn because there is null for contractor_name. But how do I make connection "Seth Jacobs","Mike Smith", while contract_number is not null here?

Seth Jacobs -- KNOWS --> Mike Smith

Does all rows have to be populated to create the nodes in neo4j? This is the image I get from the query/merge/loads from above.

enter image description here

2

2 Answers

0
votes

The APOC procedure apoc.do.case would be helpful here.

LOAD CSV WITH HEADERS FROM "file:///sample2.csv" AS line
MERGE (mn:manager_name {name: line.manager_name})
WITH mn, line
CALL apoc.do.case(
  [line.contractor_name IS NOT NULL AND line.company_name IS NOT NULL,
     "MERGE (cn:contractor_name {name:line.contractor_name})
      MERGE (cpn:company_name {name:line.company_name})
      MERGE (mn)-[:KNOWS]->(cn)
      MERGE (cn)-[:AFFILIATED]->(cpn)
      RETURN cn, cpn",
   line.contractor_name IS NOT NULL,
     "MERGE (cn:contractor_name {name:line.contractor_name})
      MERGE (mn)-[:KNOWS]->(cn)
      RETURN cn"
  ],
  '',
  {mn: mn, line: line}) YIELD value AS result
RETURN mn, result

In this sample query, the returned result would:

  • contain cn and cpn if a line contained both contractor and company names, or
  • contain just cn if the line only had a contractor name, or
  • be empty otherwise.
0
votes

You're using the same CSV for merging multiple nodes and merging relationships, you're doing too much at the same time.

Instead, try doing multiple passes, once per node type (with the appropriate filtering if the value is null), and then when all your nodes are created, do another set of passes for the relationships, using MATCH instead of MERGE.

Another option (for the future most likely) is to have multiple separate CSVs, one per node type, and then a CSV per relationship.