0
votes

I am trying to import a csv file into Neo4j DB, here is the link for the CSV file http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv

The query that I am using to import the CSV data into DB is

USING PERIODIC COMMIT 10000
LOAD CSV FROM "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line
MERGE (i:Intermediateries{name:line[1],internal_id:line[2],address:line[3],valid_until:line[4],country_codes:line[5],countries:line[6],status:line[7],node_id:line[8],sourceID:line[9]})

The Query produces the following error:- Cannot merge node using null property value for sourceID

I found some useful resources but given the size of ICIJ Panama Papers Size it will take hours to execute, is there any way to eliminate checking for NULL values and optimize the query?

2

2 Answers

2
votes

You are using the indexing wrong. Index starts from 0, not 1. Also there is a header line in the given csv. So edit the Cypher query accordingly.

Change it to the following:

USING PERIODIC COMMIT 10000 
LOAD CSV WITH HEADERS FROM "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line 
MERGE 
(i:Intermediateries{name:line[0],internal_id:line[1],address:line[2],valid_until:line[3],country_codes:line[4],countries:line[5],status:line[6],node_id:line[7],sourceID:line[8]})

See the official documentation for more details - http://neo4j.com/docs/developer-manual/current/cypher/clauses/load-csv/#load-csv-import-data-from-a-csv-file

1
votes

yes you can easily filter out rows with sourceID null

USING PERIODIC COMMIT 10000 LOAD CSV FROM  "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line 
WITH line where line[9] is not null
MERGE (i:Intermediateries{name:line[1],internal_id:line[2],address:line[3],
valid_until:line[4],country_codes:line[5],countries:line[6],status:line[7],node_id:line[8],sourceID:line[9]})

You can also use coalesce if you want to import those nodes, even though they do not have source ID

USING PERIODIC COMMIT 10000 LOAD CSV FROM  "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line 
WITH line,coalesce(line[9],"NoId") as sourceID
MERGE (i:Intermediateries{name:line[1],internal_id:line[2],address:line[3],
valid_until:line[4],country_codes:line[5],countries:line[6],status:line[7],node_id:line[8],sourceID:sourceID})

I would create a query like this if I imported this if you have apoc plugin

USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM  "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line 
MERGE (i:Intermediateries{internal_id:line.internal_id})
ON CREATE SET i += apoc.map.clean(row.properties,['internal_id'],[])

if you do not have apoc plugin you have to specify properties manually

USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM  "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line 
MERGE (i:Intermediateries{internal_id:line.internal_id})
ON CREATE SET i.name = line.name,i.address = line.address, i.valid_until = line.valid_until,
i.country_codes = line.country_code, i.countries = line.countries,i.status = line.status,i.node_id = line.node_id,
i.sourceID = line.sourceID,i.note = line.note