4
votes

I have an oracle table with >200M rows that I want load to neo4j using apoc.load.jdbc. How can this be done without exhausting memory. Basically I want to perform the equivalent of

USING PERIODIC COMMIT  
CALL apoc.load.jdbc('alias','table_name') yield row  
MATCH (r:Result {result_id:row.RESULT_ID})   
MATCH (g:Gene   {gene_id  :row.ENTITY_ID})  
create (r)-[:EXP {expression_level:row.EXPRESSION_LEVEL}]->(g)

However, USING PERIODIC COMMIT only seems to work with LOAD CSV and gives the following error when I try it with apoc.load.jdbc

Invalid input 'c': expected whitespace, comment or LoadCSVQuery (line 2,column 1 (offset: 30)) "call apoc.load.jdbc('oasis_whs', 'neo4j_exp') yield row"

I've looked at both apoc.periodic.iterate and apoc.periodic.commit but the former attempts to read the entire table into memory first and then iterate while the latter repeats the same query over and over again which won't work in this use case.

The oracle table is partitioned and I can apply a WHERE filter to load a partition at a time, however, some partitions still contain more data than will fit into memory.

I can't be the first one with this issue, can I?

Thanks in advance.

1

1 Answers

0
votes

Might be a bit late, but for others, I had the same issue with a large query killing my machine, and using apoc.period.iterate helped dramatically. You can play around with the batch size to see what works for you. The retries parameter will rerun any failed batches (perhaps another part of the query needs to have run before the failed part will complete).

CALL apoc.periodic.iterate('CALL apoc.load.jdbc('alias','table_name') yield row',
'
MATCH (r:Result {result_id:row.RESULT_ID})   
MATCH (g:Gene   {gene_id  :row.ENTITY_ID})  
create (r)-[:EXP {expression_level:row.EXPRESSION_LEVEL}]->(g)', {batchSize:10000, iterateList:true, parallel:true, retries:20})