Optimize Neo4j Cypher path finding with limited paths in an undirected graph

Question

As a follow-up from the question "Neo4j Cypher path finding slow in undirected graph". Michael Hunger and Wes Freeman kindly helped but I failed to adapt the techniques learned to path finding queries that should return the paths.

The issue:

The below query takes roughly 3s and returns 13 rows (the paths found) from a database. I find it slow and would like to have it execute faster but don't know how to optimize it. (This is an example of course but I find similar other queries slow too.)

START n=node:NodeIds('id:4000'), t=node:NodeIds('id:10778')   
MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t)   
RETURN nodes(path) AS Nodes

And the same with profile data:

neo4j-sh (0)$ profile START n=node:NodeIds('id:4000'), t=node:NodeIds('id:10778')    MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t)    RETURN nodes(path) AS Nodes;
==> +-------------------------------------------------------------------------------------------+
==> | Nodes                                                                                     |
==> +-------------------------------------------------------------------------------------------+
==> | [Node[3984]{Id:4000},Node[986]{Id:1001},Node[18536]{Id:18552},Node[10763]{Id:10778}]      |
==> | [Node[3984]{Id:4000},Node[1085]{Id:1100},Node[9955]{Id:9970},Node[10763]{Id:10778}]       |
==> | [Node[3984]{Id:4000},Node[133348]{Id:133364},Node[9955]{Id:9970},Node[10763]{Id:10778}]   |
==> | [Node[3984]{Id:4000},Node[111409]{Id:111425},Node[18536]{Id:18552},Node[10763]{Id:10778}] |
==> | [Node[3984]{Id:4000},Node[64464]{Id:64480},Node[18536]{Id:18552},Node[10763]{Id:10778}]   |
==> | [Node[3984]{Id:4000},Node[64464]{Id:64480},Node[9955]{Id:9970},Node[10763]{Id:10778}]     |
==> | [Node[3984]{Id:4000},Node[64464]{Id:64480},Node[10763]{Id:10778}]                         |
==> | [Node[3984]{Id:4000},Node[64464]{Id:64480},Node[64455]{Id:64471},Node[10763]{Id:10778}]   |
==> | [Node[3984]{Id:4000},Node[79152]{Id:79168},Node[18536]{Id:18552},Node[10763]{Id:10778}]   |
==> | [Node[3984]{Id:4000},Node[69190]{Id:69206},Node[18536]{Id:18552},Node[10763]{Id:10778}]   |
==> | [Node[3984]{Id:4000},Node[25893]{Id:25909},Node[18536]{Id:18552},Node[10763]{Id:10778}]   |
==> | [Node[3984]{Id:4000},Node[31683]{Id:31699},Node[18536]{Id:18552},Node[10763]{Id:10778}]   |
==> | [Node[3984]{Id:4000},Node[6965]{Id:6980},Node[18536]{Id:18552},Node[10763]{Id:10778}]     |
==> +-------------------------------------------------------------------------------------------+
==> 13 rows
==> 2824 ms
==> 
==> ColumnFilter(symKeys=["path", "n", "t", "  UNNAMED3", "Nodes"], returnItemNames=["Nodes"], _rows=13, _db_hits=0)
==> Extract(symKeys=["n", "t", "  UNNAMED3", "path"], exprKeys=["Nodes"], _rows=13, _db_hits=0)
==>   ExtractPath(name="path", patterns=["  UNNAMED3=n-[:ASSOCIATIVY_CONNECTION*1..3]-t"], _rows=13, _db_hits=0)
==>     PatternMatch(g="(n)-['  UNNAMED3']-(t)", _rows=13, _db_hits=0)
==>       Nodes(name="t", _rows=1, _db_hits=1)
==>         Nodes(name="n", _rows=1, _db_hits=1)
==>           ParameterPipe(_rows=1, _db_hits=0)

The setup:

The Neo4j graph database has 165k nodes and 266k relationships where all the relationships are undirected (bidirectional) and have the label "ASSOCIATIVY_CONNECTION". None of the nodes are connected to the root node. Apart from the nodes and relationships only one integer value is stored for each node (the graph database is not used to store the actual data, but just for the structure).

The memory configuration for this database is as following:

wrapper.java.initmemory=1024
wrapper.java.maxmemory=1024

neostore.nodestore.db.mapped_memory=225M
neostore.relationshipstore.db.mapped_memory=250M
neostore.propertystore.db.mapped_memory=290M
neostore.propertystore.db.strings.mapped_memory=330M
neostore.propertystore.db.arrays.mapped_memory=330M

The dataset is a graph generated by following interconnections between Wikipedia articles and is downloadable from here.

I run Neo4j 1.9.M05 community on a Windows 8 machine by starting from Neo4j.bat. I don't think hardware can be an issue as the query only causes a short 10% CPU spike. There are GBs of free RAM available.

I'd be very thankful for pointers on how to make this query run faster.

Edit: tried the same query in a slightly enhanced version of the same graph with 283k nodes and 538k relationships. It now takes 20 seconds!

Edit 2, increasing memory limits: As advised by Michael I upped the wrapper.java.initmemory and wrapper.java.maxmemory settings to 8192 (8GB). It indeed increased the memory footprint to 2,25GB of the java process running Neo4j and also it increased the performance of the query: now it's about 1s on warmed up queries (after the third run). I also upped the memory settings in the neo4j.properties config file to 2GB each but it doesn't have any noticeable effect. For all this to work I needed the 64b Java runtime (the default one you can easily download for your browser is a 32b version) so I downloaded the manual installer for it. After it's installed Neo4j will automatically start with it instead of the 32b version.

Michael Hunger Michael Hunger · Accepted Answer · 2013-03-28T18:09:45

As you are running on windows please increase your heap sizes as MMIO direct memory is part of the java heap on Windows.

Optimize Neo4j Cypher path finding with limited paths in an undirected graph

1 Answers