Cypher query to find the "best" person to introduce Tom Hanks to Tom Cruise

Question

I'm going through the neo4j 3.0.6 Movie Graph example and am at the part where we "Find someone to introduce Tom Hanks to Tom Cruise." After executing

MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors),
      (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cruise:Person {name:"Tom Cruise"})
RETURN tom, m, coActors, m2, cruise

I get the following graph:

What Cypher query would rank co-actors according to those who have the most connections with Tom Hanks AND Tom Cruise first? The result would look similar to:

Name        , connecting_movies, (OR) connecting_edges
Meg Ryan    , 4                , 8
Bonnie Hunt , 2                , 4
Kevin Bacon , 2                , 4

It's great that you got an acceptable answer. But it doesn't belong in your question as an update (that's what upvoting or accepting an answer is for). That's why I rolled it back. — David Makogon

InverseFalcon InverseFalcon · Accepted Answer · 2016-10-17T18:21:12

Since you're only looking at one relationship type (:ACTED_IN), it should be sufficient to use the number of common movies and leave out edges (edges would be 2x number of movies anyway, unless one of the actors played multiple roles within the same movie, but that doesn't seem like it would be a meaningful measure of a stronger connection).

However, we have to ensure we consider only distinct movies when we get the count, as it's possible that there could be a single movie in which all those involved acted (Tom, Tom, and the coactor), and we only want to count that movie once, and not twice. To ensure we get that distinct count, we'll need to combine both columns of movies (m and m2) into a single column, and then get a distinct count of movies in that column.

Unfortunately, at this point in time Neo4j's UNION won't allow us to continue to work on the unioned results (to get the count), so we have to instead turn each column of movies into a collection, add the collections together, then unwind that single collection into a single column of movies. The final query looks like this:

MATCH (:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors)
MATCH (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(:Person {name:"Tom Cruise"})
WITH coActors, collect(m) + collect(m2) as allMovies
UNWIND allMovies as moviesInCommon
RETURN coActors, COUNT(DISTINCT moviesInCommon) as commonMoviesCnt
ORDER BY commonMoviesCnt DESC

EDIT I changed your first MATCH into two MATCHES, since in the single match line, a movie in m would not be matched in m2 (I encourage you to change the query in your description to also be 2 matches). While this would give us distinct counts (something we want), it would also prevent us from correctly matching on coactors who acted alongside both actors-of-interest in the same movies.

You can test this by changing your persons of interest to Tom Hanks and Meg Ryan. Of course, they already know each other and need no introduction, but using these two can better show which queries work correctly when both actors coacted in the same movie(s).

Cypher query to find the "best" person to introduce Tom Hanks to Tom Cruise

2 Answers