0
votes

I am using Neo4j and wondering how to use Cypher to loop through properties of other nodes connecting one node for comparison and filter the ones satisfy condition.

Here is the sample data:

Person Movie Publish_Date
Tina   A     2016-01-01
Tina   B     2016-01-01
Tina   C     2016-03-05
Tina   D     2016-03-06
Tina   X     2018-03-09  
Bob    E     2016-08-01
Bob    F     2016-08-08
Ana    G     2016-04-05
Ana    H     2016-08-05
Ana    I     2016-12-05

Here is what I want:

Person Movie Publish_Date
Tina   A     2016-01-01
Tina   B     2016-01-01
Tina   C     2016-03-05
Tina   D     2016-03-06
Tina   X     2018-03-09   
Bob    E     2016-08-01
Bob    F     2016-08-08

I want to return the person who participated in more than 2 movies published in 30 days and movie information.

What I thought to do is for each Person, loop through the publish date of movie nodes connecting with him and retain the ones satisfy the condition in the result table.

Here is my query for getting the sample data:

MATCH (p:Person)-[r1:ACTED_IN]->(m:Movie)
WITH p, m 
ORDER BY p.Name DESC, n.Publish_Date
RETURN p.name AS Person, m.title AS Movie, m.publish_date AS Publish_Date

Please suggest.

Thanks in advance!

2
Can you clarify the time constraints? Is this more than 2 movies published within the last 30 days, or more than two movies published within some 30 day timeframe? What about overlapping occurrences, where say movies 1 and 2 and 3 happen within a 30 day timeframe but movies 2 and 3 and 4 and 5 are also within a 30 day timeframe? And would that mean that you would want smaller timeframes within this too (2 and 3 and 4 as well)?InverseFalcon
Hi, thanks for asking! I want 2 movies published within some 30 day timeframe. For overlapping situation as you mentioned, I want 1, 2, 3, 4 and 5 as they all published in at least a 30 days period with another 2 moviesMAMS
You may want to adjust your data and results a bit, as this entry Tina X 2018-03-09 should not be returned, as it was published several years after the previous movie on 2016-03-06InverseFalcon
Also you might want to adjust your requirements, as you're returning Bob and two movies, but you specified in more than 2 movies published in 30 days. Do you mean "at least 2 movies published in 30 days"?InverseFalcon

2 Answers

0
votes

I'm taking a few liberties in reinterpreting your requirements, so there are some assumptions here you'll have to confirm for this to be valid.

I'm assuming you are looking for, per person, at least one set of 2 movies that were published within 30 days of each other (otherwise you would not be expecting Bob's entries in your results).

I'm also assuming you meant to have movie X's publish_date to be 2016-03-09 instead of 2018-03-09 otherwise it should not be in the expected results.

With those assumptions, this query should do the trick:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p, m 
ORDER BY m.publish_date
WITH p, collect(m) as movies, collect(m.publish_date) as dates
UNWIND range(0, size(movies)-2) as index
WITH p, movies, dates, index 
WHERE duration.inDays(dates[index], dates[index+1]).days <= 30
UNWIND [movies[index], movies[index+1]] as movieInRange
RETURN DISTINCT p, movieInRange
ORDER BY p.name DESC

We UNWIND a range from 0 to 2 less than the size of the movies list per person so that we can do indexing into the lists (we will be evaluating movie dates in pairs so we can do the comparison).

For the adjacent pairs published within 30 days of each other, we UNWIND the collection of the adjacent movies so that the movies are both under the same variable, then we return the sorted DISTINCT values (since the same movie might occur twice, being within 30 days of both the movie before and after.

0
votes

This query will tell you whether any two movies from the same person are less than 30 days apart. From there you can filter as you wish:

MATCH (m1:Movie)<-[:ACTED_IN]-(p:Person)-[:ACTED_IN]->(m2:Movie) 
WITH p, m1,m2, datetime(m1.Publish_Date) as date1, datetime(m2.Publish_Date) as date2
return p.name,m1.title,m2.title,
CASE WHEN date1<date2
THEN date1+duration("P30D")>date2
ELSE date2+duration("P30D")>date1 END AS lessThan30DaysApart