2
votes

I use the admin-import tool of Neo4j to import bulk data in csv format. I use Integer as ID datatype in the header [journal:ID:int(Journal-ID)] and the part of importing the nodes works fine. When the import-tool comes to the relationships, I get the error that the referring node is missing. Seems like the relations-import it is searching the ID in String format. I already tried to change the type of the ID in the relations File as well, but get an other error. I found no way to specify the ID as int in the relations-File.

Here is an minimal example. Lets say we have two node types with the headers:

journal:ID:int(Journal-ID)

and

documentID:ID(Document-ID),title

and the example files journal.csv:

"123"
"987"

and document.csv:

"PMID:1", "Title"
"PMID:2", "Other Title"

We also have a relation "hasDocument" with the header:

:START_ID(Journal-ID),:END_ID(Document-ID)

and the example file relation.csv:

"123", "PMID:1"

When running the import I get the the error:

Error in input data
Caused by:123 (Journal-ID)-[hasDocument]->PMID:1 (Document-ID) referring to missing node 123

I tried to specify the relation header as

:START_ID:int(Journal-ID),:END_ID(Document-ID)

but this also produces an error.

The command to start the import is:

neo4j-admin import --nodes:Document="document-header.csv,documentNodes.csv" --nodes:Journal="journal-header.csv,journalNodes.csv" --relationships:hasDocument="hasDocument-header.csv,relationsHasDocument.csv"

Is there a way to specify the ID in the relation file as Integer or is there an other solution to that problem?

1

1 Answers

1
votes

It doesn't seem to be supported. The documentation doesn't mention it and the code doesn't have such test case.

You could import the data with String ids and cast it after you start the database.

MATCH (j:Journal)
SET j.id = toInteger(j.id)

If your dataset is large you can use apoc with iterate:

call apoc.periodic.iterate("
MATCH (j:Journal) RETURN j
","
SET j.id = toInteger(j.id)
",{batchSize:10000})