I have data on movies that can be either comedies or dramas. I have actors in those movies, who can have multiple roles per movie. I want to find all distinct sets of movies and actors where:
(drama1:Movie {Genre:'Drama'})-[role1]-(actor1:Actor)-[role2]-(comedy:Movie {Genre:'Comedy'})-[role3]-(actor2:Actor)-[role4]-(drama2:Movie {Genre:'Drama'})
That is, I want to find where two (different) dramas are connected by a comedy with which both dramas share at least one actor. I'm struggling to do this efficiently and to get neo4j to give me distinct groups of drama1,drama2,actor1,actor2,comedy. My data is on the order of a few million nodes and tens of millions of relationships, so efficiency is important. A toy setup, which can be plugged into the neo4j online console is:
create (a:Movie {Genre:'Comedy'}), (b:Movie {Genre:'Comedy'}), (c:Movie {Genre:'Comedy'}), (d:Movie {Genre:'Comedy'}), (f:Movie {Genre:'Drama'}), (h:Movie {Genre:'Drama'}),(i:Actor {Name:'Sarah'}),(j:Actor {Name: 'Maria'}),(k:Actor {Name:'Mike'}),(l:Actor {Name:'Jane'}),(m:Actor {Name:'Sam'}),(q:Actor {Name:'Matt'}),(r:Actor {Name:'Tom'}), (i)-[:ActedIn]->(a), (i)-[:ActedIn]->(a) , (i)-[:ActedIn]->(a), (i)-[:ActedIn]->(a) , (i)-[:ActedIn]->(f) , (j)-[:ActedIn]->(b) , (j)-[:ActedIn]->(h) , (j)-[:ActedIn]->(h) , (q)-[:ActedIn]->(c) , (q)-[:ActedIn]->(b) , (q)-[:ActedIn]->(a) , (r)-[:ActedIn]->(f) , (r)-[:ActedIn]->(f) , (r)-[:ActedIn]->(a) , (j)-[:ActedIn]->(b) , (j)-[:ActedIn]->(c) , (k)-[:ActedIn]->(d), (l)-[:ActedIn]->(c) , (i)-[:ActedIn]->(a) , (i)-[:ActedIn]->(h) , (m)-[:ActedIn]->(h)
I've mostly tried variations of
match (drama1:Movie {Genre:'Drama'})-[role1]-(actor1:Actor)-[role2]-(comedy:Movie {Genre:'Comedy'})-[role3]-(actor2:Actor)-[role4]-(drama2:Movie {Genre:'Drama'}) return drama1,actor1,comedy,actor2,drama2