2
votes

In the Graph Academy we have Exercise 4, Part 5 with this question: 2. Retrieve the movies and their actors where one of the actors also directed the movie, returning the actors names, the director’s name, and the movie title.

I tried this:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
WHERE exists((d)-[:ACTED_IN]->(m))
RETURN p.name, d.name, m.title

the result seemed okay, with the exception of duplicated information. My Result

After my result, I saw the expected query from graph academy and it has some small changes, changing the DIRECTED to ACTED:IN and changing the exists with DIRECTED, like this:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(d:Person)
WHERE exists((d)-[:DIRECTED]->(m))
RETURN p.name, d.name, m.title

whit this result: Correct result

We can see that there aren't any duplicated information like actor "Tom Hanks", director "Tom Hanks".

My question is, Why does Neo4j behaves like this with such a small change?

1

1 Answers

4
votes

This has to do with a certain behavior of uniqueness when traversing for a single pattern match.

Cypher uses a uniqueness called RELATIONSHIP_PATH, meaning that for each path, a relationship must be unique - it can only be traversed once per path.

This is done for a variety of reasons, the most notable being that it implicitly prevents infinite loops for variable-length traversals, as infinite loops require you to be able to traverse the same relationships over and over.

In the first query the match is:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)

In this, it is possible for the same p node to have an :ACTED_IN and :DIRECTED relationship to the same m movie node. Both of those relationships would be traversed once each, no problems, so p and d could be the same node, so you would see the same person's name come up for p and d as in the results.

In the second query the match is:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(d:Person)

In the movies graph, for an actor who acted in a movie, there will only be a single :ACTED_IN relation between that pair of nodes, never more than 1.

Because of this, it is impossible for d to be the same node as p. The :ACTED_IN relationship will be traversed once from the person node to the movie node, and it cannot be reused again to traverse back from the movie node to the person node.

Note that this restriction is only in effect for the entirely of a MATCH or an OPTIONAL MATCH. If you break up that single MATCH into multiple, like this:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
MATCH (m)<-[:ACTED_IN]-(d:Person)
...

Then you will see entries in the results where p is the same node as d. Since there are two MATCH patterns here, there is no restriction on the relationships traversed between the patterns.