4
votes

I have a database with some movies. The movies are released in regions with a hierarchy. The hierarchy is like this (Global)-[contains]->(EU), (Global)-[contain]->(US), (EU)-[contains]->(UK), (EU)-[contains]->(SE).

I want a Cypher query that will return releases of movies in my region, or one of the regions higher up in the hierarchy.

If I am in the UK and a movie is released in the UK and in the EU, I want to return only the UK release. If it's released in EU, but does not have a specific UK release, I want to return the EU release.

The problem is how do I avoid duplicates.

My data has a structure like this and I want to return a single release for each movie

(Movie1)-[has_release]->(release1)-[has_region]->(EU) 
(Movie1)-[has_release]->(release2)-[has_region]->(Global)
(Movie2)-[has_release]->(release3)-[has_region]->(UK)
(Movie2)-[has_release]->(release4)-[has_region]->(US)

In this case, when I do a query for movies in the UK, I want to return release1 (and release3), because EU has a contains relationship to UK, but I don't want to return release2, because it has already found a release for Movie1, so I want to return the release which is closest in the region hierarchy to UK, in this case EU.

2
Would you also want to return movies that only have a global release?Nicole White

2 Answers

9
votes

This is a good question, and I enjoyed coming up with an answer. I'm going to walk through my solution step-by-step. First of all, here is the example data I'm testing this with:

CREATE 

(Global:Region {name:'Global'}),
(US:Region {name:'US'}),
(EU:Region {name:'EU'}),
(UK:Region {name:'UK'}),
(SE:Region {name:'SE'}),

(Global)-[:CONTAINS]->(EU),
(Global)-[:CONTAINS]->(US),
(EU)-[:CONTAINS]->(UK),
(EU)-[:CONTAINS]->(SE),

(Movie1:Movie {name:'Movie 1'}),
(Movie2:Movie {name:'Movie 2'}),
(Release1:Release {name:'Release 1'}),
(Release2:Release {name:'Release 2'}),
(Release3:Release {name:'Release 3'}),
(Release4:Release {name:'Release 4'}),

(Movie1)-[:HAS_RELEASE]->(Release1)-[:HAS_REGION]->(EU),
(Movie1)-[:HAS_RELEASE]->(Release2)-[:HAS_REGION]->(Global),
(Movie2)-[:HAS_RELEASE]->(Release3)-[:HAS_REGION]->(UK),
(Movie2)-[:HAS_RELEASE]->(Release4)-[:HAS_REGION]->(US);

And here is my solution...

MATCH p = (m:Movie)-[:HAS_RELEASE]->(:Release)-[:HAS_REGION]->(:Region)-[:CONTAINS*0..]->(:Region {name:'UK'})
WITH m, p
ORDER BY LENGTH(p)
WITH m, HEAD(COLLECT(p)) AS path
RETURN m.name AS Movie, [x IN NODES(path) WHERE x:Release | x.name] AS Release;

...which yields:

Movie    Release
Movie 1  Release 1
Movie 2  Release 3

OK so let's go through this query step-by-step. The first part...

MATCH p = (m:Movie)-[:HAS_RELEASE]->(:Release)-[:HAS_REGION]->(:Region)-[:CONTAINS*0..]->(:Region {name:'UK'})

...matches movies with a release in any region that contains the UK region (in any length). Note the *..0 means we are still capturing releases in the UK because that would be a 0-length step.

Then, for each movie, we want to order by path length, because for movies with multiple paths (like Movie 1), we want it so that the shortest paths are first...

WITH m, p
ORDER BY LENGTH(p)

...because we want to collect and keep only the path that is most direct to the UK node (which is the first path in the collection, since we ordered by path length ascending):

WITH m, HEAD(COLLECT(p)) AS path

And now we have a single path for each movie. The last line uses a combination of EXTRACT and FILTER to get the Release node names out of each path:

RETURN m.name AS Movie, [x IN NODES(path) WHERE x:Release | x.name] AS Release
2
votes
MATCH regions = (a:Region)-[:CONTAINS*]->(b:Region)
WHERE b.title = "UK"
WITH regions
MATCH (m:Movie {title: "The Matrix"})
WITH m, regions
MATCH p = (m)-[:HAS_RELEASE]->(rel:Release)-[:HAS_REGION]->(reg:Region)-[:CONTAINS*0..]->(regMin)
WHERE reg IN nodes(regions)
WITH rel
MATCH p = (a:Region)-[:CONTAINS*0..]->(b:Region)<-[:HAS_REGION]-(rel)
WITH COLLECT(p) AS paths, MAX(length(p)) AS maxLength
WITH FILTER(path IN paths WHERE length(path) = maxLength) as path
WITH path UNWIND path AS result
RETURN FILTER(p IN nodes(result) WHERE p:Release)