3
votes

So I have around 70 million spatial records that i want to add to the spatial layer (I've tested with a small set and everything is smoothly, queries returning the same results as postgis and the layer operation seems fine) But when i try to add all the spatial records to the database, the performance degrades rapidly to the point that it gets really slow at around 5 million (around 2h running time) records and hangs at ~7.7 million (8 hours lapsed).

Since the spatial index is an Rtree that uses the graph structure to construct itself, i am wondering why is it degrading when the number os records increase. Rtree insertions are O(n) if im not mistaken and thats why im concerned it might be something between the rearranging of bounding boxes, nodes that are not tree leaves that are causing the addToLayer process to get slower over time.

Currently im adding nodes to the layer like that (lots of hardcoded stuff since im trying to figure out the problem before patterns and code style):

Transaction tx = database.beginTx();
    try {

        ResourceIterable<Node> layerNodes = GlobalGraphOperations.at(database).getAllNodesWithLabel(label);
        long i = 0L;
        for (Node node : layerNodes) {
            Transaction tx2 = database.beginTx();
            try {
                layer.add(node);
                i++;
                if (i % commitInterval == 0) {
                    log("indexing (" + i + " nodes added) ... time in seconds: "
                            + (1.0 * (System.currentTimeMillis() - startTime) / 1000));
                }
                tx2.success();
            } finally {
                tx2.close();
            }
        }
        tx.success();
    } finally {
        tx.close();
    }

Any thoughts ? Any ideas of how performance could be increased ?

ps.: using java API Neo4j 2.1.2, Spatial 0.13 Core i5 3570k @4.5Ghz, 32GB ram dedicated 2TB 7200 hard drive to the database (no OS, no virtual memory files, only the data itself)

ps2.: All geometries are LineStrings (if thats important :P) they represent streets, roads, etc..

ps3.: the nodes are already in the database, i only need to add them to the Layer so that i can perform spatial queries, bbox and wkb attributes are OK, tested and working for a small set.

Thank you in advance

After altering and running the code again (which takes 5hours only to insert the points into the database, no layer involved) this happened, will try to increase the jvm heap and the embeddedgraph memory parameters.

indexing (4020000 nodes added) ... time in seconds: 8557.361
Exception in thread "main" org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction
    at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:140)
    at gis.CataImporter.addDataToLayer(CataImporter.java:263)
    at Neo4JLoadData.addDataToLayer(Neo4JLoadData.java:138)
    at Neo4JLoadData.main(Neo4JLoadData.java:86)
Caused by: javax.transaction.SystemException: Kernel has encountered some problem, please perform neccesary action (tx recovery/restart)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
    at org.neo4j.kernel.impl.transaction.KernelHealth.assertHealthy(KernelHealth.java:61)
    at org.neo4j.kernel.impl.transaction.TxManager.assertTmOk(TxManager.java:339)
    at org.neo4j.kernel.impl.transaction.TxManager.getTransaction(TxManager.java:725)
    at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:119)
    ... 3 more
Caused by: javax.transaction.xa.XAException
    at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:560)
    at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:448)
    at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:385)
    at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:123)
    at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
    at gis.CataImporter.addDataToLayer(CataImporter.java:256)
    ... 2 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.neo4j.kernel.impl.nioneo.store.DynamicRecord.clone(DynamicRecord.java:179)
    at org.neo4j.kernel.impl.nioneo.store.PropertyBlock.clone(PropertyBlock.java:215)
    at org.neo4j.kernel.impl.nioneo.store.PropertyRecord.clone(PropertyRecord.java:221)
    at org.neo4j.kernel.impl.nioneo.xa.Loaders$2.clone(Loaders.java:118)
    at org.neo4j.kernel.impl.nioneo.xa.Loaders$2.clone(Loaders.java:81)
    at org.neo4j.kernel.impl.nioneo.xa.RecordChanges$RecordChange.ensureHasBeforeRecordImage(RecordChanges.java:217)
    at org.neo4j.kernel.impl.nioneo.xa.RecordChanges$RecordChange.prepareForChange(RecordChanges.java:162)
    at org.neo4j.kernel.impl.nioneo.xa.RecordChanges$RecordChange.forChangingData(RecordChanges.java:157)
    at org.neo4j.kernel.impl.nioneo.xa.PropertyCreator.primitiveChangeProperty(PropertyCreator.java:64)
    at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransactionContext.primitiveChangeProperty(NeoStoreTransactionContext.java:125)
    at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.nodeChangeProperty(NeoStoreTransaction.java:1244)
    at org.neo4j.kernel.impl.persistence.PersistenceManager.nodeChangeProperty(PersistenceManager.java:119)
    at org.neo4j.kernel.impl.api.KernelTransactionImplementation$1.visitNodePropertyChanges(KernelTransactionImplementation.java:344)
    at org.neo4j.kernel.impl.api.state.TxStateImpl$6.visitPropertyChanges(TxStateImpl.java:238)
    at org.neo4j.kernel.impl.api.state.PropertyContainerState.accept(PropertyContainerState.java:187)
    at org.neo4j.kernel.impl.api.state.NodeState.accept(NodeState.java:148)
    at org.neo4j.kernel.impl.api.state.TxStateImpl.accept(TxStateImpl.java:160)
    at org.neo4j.kernel.impl.api.KernelTransactionImplementation.createTransactionCommands(KernelTransactionImplementation.java:332)
    at org.neo4j.kernel.impl.api.KernelTransactionImplementation.prepare(KernelTransactionImplementation.java:123)
    at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.prepareKernelTx(XaResourceManager.java:900)
    at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:510)
    at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
    at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:548)
    ... 7 more

28/07 -> Increasing memory did not help, now im testing some modifications in the RTreeIndex and LayerRTreeIndex (what exactly does the field maxNodeReferences does ?

// Constructor

public LayerRTreeIndex(GraphDatabaseService database, Layer layer) {
    this(database, layer, 100);     
}

public LayerRTreeIndex(GraphDatabaseService database, Layer layer, int maxNodeReferences) {
    super(database, layer.getLayerNode(), layer.getGeometryEncoder(), maxNodeReferences);
    this.layer = layer;
}

It is hardcoded to 100, and changing its value changes when (number of nodes added wise) my addToLayer method crashes into OutOfMemory error, If im not mistaken, changing that field's value increases or decreases the tree's width and depth (being 100 wider than 50, and 50 being deeper than 100).

To summarize the progress so far:

  • Incorrect usage of transactions corrected by @Jim
  • Memory Heap increased to 27GB following @Peter 's advice
  • 3 spatial layers to go, but now the problem gets real because they're the big ones.
  • Did some memory profiling while adding nodes to the spatial layer and i found interesting points.

Memory and GC profiling: http://postimg.org/gallery/biffn9zq/

The type that uses the most memory througout the entire process is the byte[], which i can only assume belongs to the geometries wkb properties (either the geometry itself or the rtree's bbox). Having that in mind, I also noticed (you can check on the new profiling images) that the ammount of heap space used never goes below the 18GB mark.

According to this question are java primitives garbage collected primitive types in java are raw data, therefore not being subjected to garbage collection, and are only freed from the method's stack when the method returns (so maybe when i create a new spatial layer, all those wkb byte arrays will remain in memory until I manually close the layer object).

Does that make any sense ? isnt there a better way to manage memory resources so that the layer doesnt keep unused, old data loaded ?

3
What layer class are you using?Jim Biard
Here's some profiling info, dont know if it helps CPU time methods Self time methods CPU time classescatacavaco
Google indicates there are some issues in the Neo4j rtree implementation. Looks like there might be a bug.Has QUIT--Anony-Mousse
Looks like im facing some weird behavior when adding nodes to the spatial layer, usually it crashes with that exception when getting close to 4 million nodescatacavaco

3 Answers

2
votes

Catacavaco,

You are doing each add as a separate transaction. To make use of your commitInterval, you need to change your code to something like this.

Transaction tx = database.beginTx();

try {
    ResourceIterable<Node> layerNodes = GlobalGraphOperations.at(database).getAllNodesWithLabel(label);

    long i = 0L;

    for (Node node : layerNodes) {
        layer.add(node);
        i++;

        if (i % commitInterval == 0) {
            tx.success();
            tx.close();

            log("indexing (" + i + " nodes added) ... time in seconds: "
                + (1.0 * (System.currentTimeMillis() - startTime) / 1000));

            tx = database.beginTx();
        }
    }

    tx.success();
} finally {
    tx.close();
}

See if this helps.

Grace and peace,

Jim

2
votes

Looking at Error java.lang.OutOfMemoryError: GC overhead limit exceeded, there might be some excessive object creation going on. From your profiling results it doesn't look like it, could you double check?

1
votes

Finally solved the problem with three fixes: setting cache_type=none increasing heap size for neostore low level graph engine and setting use_memory_mapped_buffers=true so that memory management is done by the OS and not the slowish JVM

that way, my custom batch insertion in the spatial layers went much faster, and without any errors/exceptions

Thanks for all the help provided, i guess my answer is just a combination of all the tips people provided here, thanks very much