2
votes

As I am new to neo4j, I am currently experimenting with the neo4j movie database sample.

I was wondering what the best way was to compare subgraphs and relationships, for example, how to get all movies with identical crew.

Based on other questions here on stackoverflow, I got it to work to return all movies where specific actors acted in together:

WITH ['Tom Hanks', 'Meg Ryan'] as names
MATCH (p:Person)
WHERE p.name in names
WITH collect(p) as persons
WITH head(persons) as head, tail(persons) as persons
MATCH (head)-[:ACTED_IN]->(m:Movie)
WHERE ALL(p in persons WHERE (p)-[:ACTED_IN]->(m))
RETURN m.title

But, how could I retrieve movies with identical actors without specifying the actors names?

2

2 Answers

2
votes

Some alternate approaches that may be more efficient (check using PROFILE):

Only match from movies to actors once, then collect them and UNWIND them the number of times you need to generate cross products, then filter out and compare. This saves you from having to hit the db multiple times, since all you need is the data obtained from the first match. I'm going to borrow Bruno's query and tweak it a bit.

// match the first movie and all its actors
match (m1:Movie)<-[:ACTED_IN]-(a1:Person)
// order actors by name
with m1, a1 order by a1.name
// store ordered actors into actors1 variable
with m1, collect(a1) as actors1
// collect this data into a single collection
with collect({m:m1, actors:actors1}) as data
// generate cross product of the data
unwind data as d1
unwind data as d2
with d1, d2
// prevent comparison against the same movie, or the same pairs in different orders
where id(d1.m) < id(d2.m) and d1.actors = d2.actors
// return movies that have the same actors
return d1.m, d2.m

Alternately, you can group movies by their actors and only return movies that are grouped accordingly:

// match the first movie and all its actors
match (m1:Movie)<-[:ACTED_IN]-(a1:Person)
// order actors by name
with m1, a1 order by a1.name
// store ordered actors into actors1 variable
with m1, collect(a1) as actors1
// group movies with their sets of actors
with collect(m1) as movies, actors1
// only interested in where multiple movies have the same actor sets
where size(movies) > 1
// return the collection of movies with the same actors
return movies

The second query is likely better here, as you get all movies with the same cast, rather than getting pairs per row.

2
votes

This query should work:

// match the first movie and all its actors
match (m1:Movie)<-[:ACTED_IN]-(a1:Person)
// order actors by name
with m1, a1 order by a1.name
// store ordered actors into actors1 variable
with m1, collect(a1) as actors1
// match the second movie and all its actors
match (m2:Movie)<-[:ACTED_IN]-(a2:Person)
// avoid match the same movie with where id(m1) > id(m2)
where id(m1) > id(m2)
// order actors of m2 by name
with m1, m2, actors1, a2 order by a2.name
// store ordered actors of m2 into actors2 variable
// pass to the next context only when the ordered arrays (actors1 and actors2) are equals
with m1, m2, actors1, collect(a2) actors2 where actors1 = actors2
// return movies that have the same actors
return m1, m2 

Using the movie database (:play movie graph) this query produced this output:

╒══════════════════════════════════════════════════════════════════════╤══════════════════════════════════════════════════════════════════════╕
│"m1"                                                                  │"m2"                                                                  │
╞══════════════════════════════════════════════════════════════════════╪══════════════════════════════════════════════════════════════════════╡
│{"title":"The Matrix Revolutions","tagline":"Everything that has a beg│{"title":"The Matrix Reloaded","tagline":"Free your mind","released":2│
│inning has an end","released":2003}                                   │003}                                                                  │
└──────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────┘