2
votes

I have a large network of over 15 million nodes. I want to remove the property "CONTROL" from all of them using a Cypher query in the neo4-shell.

If I try and execute any of the following:

  • MATCH (n) WHERE has(n.`CONTROL`) REMOVE n.`CONTROL` RETURN COUNT(n);
  • MATCH (n) WHERE has(n.`CONTROL`) REMOVE n.`CONTROL`;
  • MATCH (n) REMOVE n.`CONTROL`;

the system returns:

Error occurred in server thread; nested exception is: java.lang.OutOfMemoryError: Java heap space

Even the following query gives the OutOfMemoryError:

  • MATCH (n) REMOVE n.`CONTROL` RETURN n.`ID` LIMIT 10;

As a test, the following does execute properly:

  • MATCH (n) WHERE has(n.`CONTROL`) RETURN COUNT(n);

returning 16636351.

Some details:

The memory limit depends on the following settings:

  • wrapper.java.maxmemory (conf/neo4j-wrapper.conf)
  • neostore..._memory (conf/neo4j.properties)

By setting these values to total 28 GB in both files, results in a java_pidXXX.hprof file of about 45 GB (wrapper.java.additional=-XX:+HeapDumpOnOutOfMemoryError).

The only clue I could google was:

...you use the Neo4j-Shell which is just an ops tool and just collects the data in memory before sending back, it was never meant to handle huge result sets.

Is it really not possible to remove properties in large networks using the neo4j-shell and cypher? Or what am I doing wrong?

PS

Additional information:

  • Neo4j version: 2.1.3

  • Java versions: Java(TM) SE Runtime Environment (build 1.7.0_76-b13) and OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~trusty1)

  • The database is 7.4 GB (16636351 nodes, 14724489 relations)

  • The property "CONTROL" is empty, i.e., it has just been defined for all the nodes without actually assigning a property value.

An example of the exception from data/console.log:

java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid20541.hprof ... Dump file is incomplete: file size limit Exception in thread "GC-Monitor" Exception in thread "pool-2-thread-2" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.lang.StringCoding.safeTrim(StringCoding.java:79) at java.lang.StringCoding.access$300(StringCoding.java:50) at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:305) at java.lang.StringCoding.encode(StringCoding.java:344) at java.lang.StringCoding.encode(StringCoding.java:387) at java.lang.String.getBytes(String.java:956) at ch.qos.logback.core.encoder.LayoutWrappingEncoder.convertToBytes(LayoutWrappingEncoder.java:122) at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:135) at ch.qos.logback.core.OutputStreamAppender.writeOut(OutputStreamAppender.java:194) at ch.qos.logback.core.FileAppender.writeOut(FileAppender.java:209) at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:219) at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103) at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88) at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48) at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:273) at ch.qos.logback.classic.Logger.callAppenders(Logger.java:260) at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:442) at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:396) at ch.qos.logback.classic.Logger.warn(Logger.java:709) at org.neo4j.kernel.logging.LogbackService$Slf4jToStringLoggerAdapter.warn(LogbackService.java:243) at org.neo4j.kernel.impl.cache.MeasureDoNothing.run(MeasureDoNothing.java:84) java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1079) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Exception in thread "Statistics Gatherer[primitives]" java.lang.OutOfMemoryError: Java heap space Exception in thread "RMI RenewClean-[10.65.4.212:42299]" java.lang.OutOfMemoryError: Java heap space Exception in thread "RMI RenewClean-[10.65.4.212:43614]" java.lang.OutOfMemoryError: Java heap space

1

1 Answers

2
votes

see here: http://jexp.de/blog/2013/05/on-importing-data-in-neo4j-blog-series/

To update data with Cypher it is also necessary to take transaction size into account. For the embedded case, batching transactions is discussed in the next installment of this series. For the remote execution via the Neo4j REST API there are a few important things to remember. Especially with large index lookups and match results, it might happen that the query updates hundreds of thousands of elements. Then a paging mechanism using WITH and SKIP/LIMIT can be put in front of the updating operation.

MATCH (m:Movie)<-[:ACTED_IN]-(a:Actor)
WITH a, count(*) AS cnt
SKIP {offset} LIMIT {pagesize}
SET a.movie_count = cnt
RETURN count(*)

Run with pagesize=20000 and increasing offset=0,20000,40000,… until the query returns a count < pagesize

So in your case, repeat this until it returns 0 rows. You can also increase the limit to 1M.

MATCH (n) WHERE has(n.`CONTROL`) 
WITH n
LIMIT 100000
REMOVE n.`CONTROL` 
RETURN COUNT(n);