Role of variables in cypher match query

Question

I am taking some steps in Cypher and Neo4j and tying to understand how cypher deals with "variables".

Specifically, I have a query

match (A {name: "A"})
match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)
match (c)-[:st]->(b)
return b

which does the job I want. Now, in the code I am using a match clause two times (lines 2 and 3), so that the variables (c) and (d) basically contain the same nodes before the final match on line 4. Can I write the query without having to repeat the second match clause? Using

match (A {name: "A"})
match (A)<-[:st*]-(B)-[:hp]->(b)
match (b)-[:st]->(b)
return b

seems to be something very different, returning nothing since there are no :st type relationships from a node in (b) to itself. My understanding so far is that, even if (b) and (c) contain the same nodes,

match (c)-[:st]->(b)

tries to find matches between ANY node of (c) and ANY node of (b), whereas

match (b)-[:st]->(b)

tries to find matches from a particular node of (b) onto itself? Or is it that one has to think of the 3 match clauses as a holistic pattern?

Thanx for any insight into the inner working ...

Frank Pavageau Frank Pavageau · Accepted Answer · 2016-08-23T11:55:26

When you write the 2 MATCH statements

match (A)<-[:st*]-(C)-[:hp]->(c)
match (A)<-[:st*]-(B)-[:hp]->(b)

they don't depend on each other's results (only on the result of the previous MATCH finding A). The Cypher engine could execute them independently and then return a cartesian product of their results, or it could execute the first MATCH and for each result, then execute the second MATCH, producing a series of pairs using the current result of the first MATCH and each result of the second MATCH (the actual implementation is a detail). Actually, it could also detect that the same pattern is matched twice, execute it only once and generate all possible pairs from the results.

To summarize, b and c are taken from the same collection of results, but independently, so you'll get pairs where b and c are the same node, but also all the other pairs where they are not.

If you do a single MATCH, you obviously have a single node.

Supposing a MATCH returns 2 nodes 1 and 2, with the 2 intermediate MATCH the final MATCH will see all 4 pairs:

     1       2
1  (1, 1)  (1, 2)
2  (2, 1)  (2, 2)

whereas with a single intermediate MATCH and a final MATCH using b twice, it will only see:

     1       2
1  (1, 1)
2          (2, 2)

which are not the interesting pairs, if you don't have self-relationships.

Note that it's the same in a SQL database if you do a SELECT on 2 tables without a join: you also get a cartesian product of unrelated results.

Role of variables in cypher match query

1 Answers