1
votes

I have 2 nodes. First of them "b1" has 16m relationships and second one "b" - 17k. Label B is indexed on the id property.

My query to retrieve if they have a direct relation is:

profile 
MATCH (b:B {id :'D006019' }) WITH b 
MATCH (b1:B {id :'D006801' }) WITH b, b1 
MATCH (b)-[r]-(b1) RETURN r

Several observations:

  • Query is extremely slow. It's running for like 5 mins. First it makes a nodeindexscan which is very fast, but somehow it manages to grab the node b1 and continues execution with expanding this node. Byt "b1" has 16m relations and this with the following filter ruins the performance

  • I can make this query fast enough if I change it a little.

Here is the much faster query:

profile 
MATCH (bB {id :'D006019' }) WITH b 
MATCH (b1:B) WHERE b1.id IN ['D006801' ] WITH b, b1   
MATCH (b)-[r]-(b1)  RETURN r 

So now "b1" is in "IN" clause and neo4j starts expanding over "b" which has only 17k relations and the query executes around 100 ms.

My question is: can the query be written in a way that neo4j expands automatically on the less connected node.

1
A query like MATCH (b1:B {id:'D006801'})-[r]-(b:B {id :'D006019' }) RETURN r is not enough (and faster)? - Bruno Peres
it doesn't change anything. Again it starts to expand from "b1". One more to add: I switched their places just to check if it starts to expand from the last node in the query but it wasn't the case. It again chose the over connected node - user732456

1 Answers

0
votes

Sometimes you have to give Cypher some hints:

MATCH (b:B {id :'D006019'})
USING INDEX b:B(id)
MATCH (b1:B {id :'D006801'})
USING INDEX b1:B(id)
MATCH (b)-[r]-(b1)
RETURN r;

The above query tells Cypher that it should use the :B(id) index for each of the first 2 matches. Without the hints, there is currently a tendency for the planner to only use the index once.