1
votes

I am unable to create relationships importing .csv files in Neo4j

The nodes I have are for Medical Providers and Medical Conditions

The relationship is Provider-[TREATS]->Condition

Here is a subset of my providers csv:

Provider,ProviderID,Office,Street,City,State,Zip,Phone
Dr. Mxxxxi MD,1,The xxx Hospital,1xxx xxx Hwy,Ft Wright,KY,4xxxxx,(xxx) xxx-3304

Here is a subset of my conditions csv:

condition,conditionID
Acute Leukemia,1
Acute Lymphoid Leukemia,2
Acute Myeloid Leukemia,3
Adrenal Gland Cancer,4
....

Here is a subset of my relations csv:

ProviderID,ConditionID
1,1
1,2
1,3
1,4
1,5
1,6
1,7
1,8
1,9
...

Here are the import/create statements:

// Create providers
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///providers.csv" AS row
CREATE (:Provider {provider: row.Provider, providerID: row.ProviderID, officeName: row.OfficeName, street:row.Street,  city:row.City, state:row.State,  zip:row.Zip,  phone: row.Phone});

Added 1 label, created 1 node, set 7 properties, statement completed in 283 ms

// Create conditions
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///conditions.csv" AS row
CREATE (:Condition {Condition: row.condition, ConditionID: 
row.conditionID});

Added 100 labels, created 100 nodes, set 200 properties, statement completed in 262 ms.

I created indexes:

CREATE INDEX ON :Provider(providerID);
CREATE INDEX ON :Condition(conditionID);

This is the import/create relationship statement and result:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///ProviderConditionsTreated.csv" AS row
MATCH (p:Provider { providerID: row.ProviderID})
WITH p
MATCH (c:Condition { conditionID: p.ConditionID})
CREATE (p)-[t:TREATS]->(c);

(no changes, no records)

I have also tried this with no records

MATCH (p:Provider { providerID: row.ProviderID})
MATCH (c:Condition { conditionID: row.ConditionID})
CREATE (p)-[t:TREATS]->(c);

(no changes, no records)

1

1 Answers

3
votes

I see two issues with your import query:

  1. The Cypher language is case-sensitive for labels, types and property names. In your providers.csv file, the ProviderID attribute starts with an uppercase character, but in the conditions.csv file, the conditionID attribute starts with a lowercase one. While loading them with the LOAD CSV commands, these are actually changed to providerId and ConditionId. It's best to keep these consistent both for the CSV files and for the vertex properties.

  2. You should not use WITH p as you will no longer be able to access the row variable. This is fixed in your last query. However, even that can be simplified by using a single MATCH clause.

In conclusion, the following query worked for me:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///relations.csv" AS row
MATCH
  (p:Provider { providerID: row.ProviderID}),
  (c:Condition { ConditionID: row.ConditionID})
CREATE (p)-[t:TREATS]->(c);

Created 4 relationships, completed after 110 ms.