3
votes

I have some questions regarding Neo4j's Query profiling. Consider below simple Cypher query:

PROFILE 
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
      (m:Consumer {mobileNumber: "xxxxxxxxxxx"}) 
WITH n,m 
MATCH (n)-[r:HAS_CONTACT]->(m) 
RETURN n,m,r;

and output is:

enter image description here

So according to Neo4j's Documentation:

3.7.2.2. Expand Into

When both the start and end node have already been found, expand-into is used to find all connecting relationships between the two nodes.

Query.

MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN
> fof

So here in the above query (in my case), first of all, it should find both the StartNode & the EndNode before finding any relationships. But unfortunately, it's just finding the StartNode, and then going to expand all connected :HAS_CONTACT relationships, which results in not using "Expand Into" operator. Why does this work this way? There is only one :HAS_CONTACT relationship between the two nodes. There is a Unique Index constraint on :Consumer{mobileNumber}. Why does the above query expand all 7 relationships?

Another question is about the Filter operator: why does it requires 12 db hits although all nodes/ relationships are already retrieved? Why does this operation require 12 db calls for just 6 rows?

Edited

This is the complete Graph I am querying: Graph Data

Also I have tested different versions of same above query, but the same Query Profile result is returned:

1

PROFILE
 MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
 MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"}) 
 WITH n,m 
 MATCH (n)-[r:HAS_CONTACT]->(m) 
 RETURN n,m,r;

2

PROFILE
 MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"}) 
 WITH n,m 
 MATCH (n)-[r:HAS_CONTACT]->(m) 
 RETURN n,m,r;

3

PROFILE 
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}) 
WITH n 
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"}) 
RETURN n,m,r;
2
Could you post some data here to allow us to better understand your graph? This may - and this is a little bit of a guess based on experience - to do with the query planner trying to reduce the load created by producing a cartesian product (i.e., in the first match clause). Have you tried changing the first part of the query to use two MATCH clauses rather than one with a comma?Dom Weldon
@DomWeldon Kindly check the updated questionAfridi

2 Answers

2
votes

The query you are executing and the example provided in the Neo4j documentation for Expand Into are not the same. The example query starts and ends at the same node.

If you want the planner to find both nodes first and see if there is a relationship then you could use shortestPath with a length of 1 to minimize the DB hits.

PROFILE 
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
  (m:Consumer {mobileNumber: "xxxxxxxxxxx"}) 
WITH n,m 
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
2
votes

Why does this do this?

It appears that this behaviour relates to how the query planner performs a database search in response to your cypher query. Cypher provides an interface to search and perform operations in the graph (alternatives include the Java API, etc.), queries are handled by the query planner and then turned into graph operations by neo4j's internals. It make sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because a cypher query is written one way, it won't necessarily search the graph in the way we imagine it will in our head.

The documentation on this seemed a little sparse (or, rather I couldn't find it properly), any links or further explanations would be much appreciated.

Examining your query, I think you're trying to say this:

"Find two nodes each with a :Consumer label, n and m, with contact numbers x and y respectively, using the mobileNumber index. If you find them, try and find a -[:HAS_CONTACT]-> relationship from n to m. If you find the relationship, return both nodes and the relationship, else return nothing."

Running this query in this way requires a cartesian product to be created (i.e., a little table of all combinations of n and m - in this case only one row - but for other queries potentially many more), and then relationships to be searched for between each of these rows.

Rather than doing that, since a MATCH clause must be met in order to continue with the query, neo knows that the two nodes n and m must be connected via the -[:HAS_CONTACT]-> relationship if the query is to return anything. Thus, the most efficient way to run the query (and avoid the cartesian product) is as below, which is what your query can be simplified to.

"Find a node n with the :Consumer label, and value x for the index mobileNumber, which is connected via a -[:HAS_CONTACT]-> relationshop to a node m with the :Consumer label, and value y for its proprerty mobileNumber. Return both nodes and the relationship, else return nothing."

So, rather than perform two index searches, a cartesian product and a set of expand into operations, neo performs only one index search, an expand all, and a filter.

You can see the result of this simplification by the query planner through the presence of AUTOSTRING parameters in your query profile.

How to Change Query to Implement Search as Desired

If you want to change the query so that it must use an expand into relationship, make the requirement for the relationship optional, or use explicitly iterative execution. Both these queries below will produce the initially expected query profiles.

Optional example:

PROFILE
 MATCH (n:Consumer{mobileNumber: "xxx"})
 MATCH (m:Consumer{mobileNumber: "yyy"}) 
 WITH n,m 
 OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m) 
 RETURN n,m,r;

Iterative example:

PROFILE
 MATCH (n1:Consumer{mobileNumber: "xxx"})
 MATCH (m:Consumer{mobileNumber: "yyy"}) 
 UNWIND COLLECT(n1) AS n
 MATCH (n)-[r:HAS_CONTACT]->(m) 
 RETURN n,m,r;