0
votes

For development purposes I'm trying to upload data into neo4j-community-2.2.0-M03. Upload is down from several CSV files using shell and cypher scripts. The exact same script as used to successfully upload data into neo4j-2.1.7 are used. Neo4j 2.1.7 uploads the 201589 nodes and 2163494 edges in 9 minutes.

Neo4j-2.2.0-M03 properly uploads the first three types of nodes (26k nodes) but hence fails uploading the last two files, returning: "NotInTransactionException: The statement has been closed.".

Cypher command is the following:

CREATE INDEX ON :Fingerprint(Code);
CREATE INDEX ON :Fingerprint(Size);
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'mypath_to_file.csv' 
  as csvLine FIELDTERMINATOR '\t' 
MATCH (m:Molecule {NSC: toInt(csvLine.NSC)}) 
MERGE (f:Fingerprint {Code: toInt(csvLine.Identifier), Size: toInt(csvLine.Distance)}) 
CREATE (m)-[:hasBit {Type: 'SCFP6', Atoms: csvLine.Atoms}]-> (f);

Data in the cvs file look like this:

NSC Identifier  Distance    Atoms   Substructure
128 1   0   1   [*]C(=[*])[*]
128 0   0   2   [*]C
128 13  0   3   [*]=O
128 9   0   4   [*]N([*])[*]
128 3   0   5   [*][c](:[*]):[*]
128 17  0   12  [*]S[*]
128 1256995004  2   1 2 3 4 [*]N([*])C(=O)C
128 136627117   2   1 2 [*]C(=[*])C
128 1311071855  2   1 3 [*]C(=O)[*]

Is there anything changed from v2.1.7 to 2.2.0-M03 with regard to CSV data upload? Is this a known bug?

Please don't hesitate if you need addition information of example files and scripts.

2
Is this still an issue with 2.2.0-M04 (released this week)?Stefan Armbruster
In addition, if I extract (head) the 999 first lines of the cvs lines, upload works, the first 1000 fail. Row 1000 is: 1163 1 0 2 []C(=[])[*] therefore nothing different as compared to others. I'll check with M04 readily.Pierre
Same issue, same error with M04.Pierre
Please open a github issue at github.com/neo4j/neo4j/issues/new.Stefan Armbruster
Can you also share your path/to/neo/data/graph.db/messages.log file?Michael Hunger

2 Answers

1
votes

Can you just try to do:

LOAD CSV WITH HEADERS FROM 'mypath_to_file.csv' 
  as csvLine FIELDTERMINATOR '\t' 
RETURN count(*);

and

LOAD CSV WITH HEADERS FROM 'mypath_to_file.csv' 
  as csvLine FIELDTERMINATOR '\t' 
RETURN csvLine
SKIP 25900;

to see if something is off with the data or parsing? Perhaps an exception happens during the parsing that rolls the transaction back?

I would change your MERGE to:

   MERGE (f:Fingerprint {Code: toInt(csvLine.Identifier)})
     ON CREATE SET f.Size=toInt(csvLine.Distance)
0
votes

Issue has been addressed in neo4j 2.2.0-RC01. Full load of 20158 nodes and 2163494 edges in 3 minutes vs 9 minutes with neo4j 2.1.7.