Neo4j Cypher queries EXPLAINed identically but warnings are generated just for one

Question

I have csv file containing a one-to-many relation where each element of type A is composed by one or more elements of type B but each element of B refers to only one element of type A.

An example:

   A   |  B   
-------------
   a1  |  b1
   a1  |  b2
   a1  |  b3
   a2  |  b4

I have already created the node in a neo4j graph and now I want to create an edge for these relationships.

I thought this query

LOAD CSV WITH HEADERS FROM "file:///file.csv" AS row
WITH row
MATCH (n:A {A_ID: row.a_id}), (t:B {BID : row.b_id})
MERGE (n)-[:HAS_CONNECTION]->(t);

but Neo4j prompts the following warning:

This query builds a cartesian product between disconnected patterns. If a part of a query contains multiple disconnected patterns, this will build a cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH (identifier is: (t))

So I changed it to:

LOAD CSV WITH HEADERS FROM "file:///file.csv" AS row
WITH row
MATCH (t:B {BID : row.b_id})
WITH row, t
MATCH (n:A {AID: row.a_id})
MERGE (n)-[:HAS_CONNECTION]->(t);

and Neo4j does not complain.

However if I EXPLAIN both the queries the result is the same.

Is neo4j useless complaining about the first query or there are effective benefits with the second?

InverseFalcon InverseFalcon · Accepted Answer · 2019-05-29T18:13:50

While the warning is true, the queries do build a cartesian product, that is fine in this case, as this is exactly what you want, n and t even if they aren't connected, and cardinality would be low in any case (likely 1, if these are unique nodes).

Disregard the warning and keep your first query, when you're doing something like this where the expected number of nodes of each of those variables is 1, or at least small.

As for why the warning doesn't appear in the second plan, that's likely just a limitation on what is looked at to generate the warning. These are still equivalent, and the same thing applies.

And just to note the real reason for the warning, it's to prevent you from doing something like:

MATCH (a:A), (b:B)

or similar, where you would end up with a cartesian product between all of one kind of node against all of another. When you narrow these down with specific properties (especially unique properties) that's just a 1x1 cartesian product, no issues.

Neo4j Cypher queries EXPLAINed identically but warnings are generated just for one

1 Answers