Filter Relationships in Neo4j Using Start/End Dates

Question

I have a graph model -

(p:Person)-[r:LINK {startDate: timestamp, endDate: timestamp}]->(c:Company)

A person can be linked to multiple companies at the same time and a company can have multiple people linking to it at the same time (i.e. there is a many-to-many relationship between companies and people).

The endDate property is optional and will only be present when a person has left a company.

I am trying to display a network of connections and can successfully return all related nodes from a person using the following cypher query (this will display 2 levels of people connections) -

MATCH (p:Person {id:<id>})-[r:LINK*0..4]-(l) RETURN *

What I now need to do is filter the relationships where the relationships match on timeframe, e.g. Person 1 worked at Company A between 01/01/2000 and 31/12/2002. Person 2 worked at Company A between 01/01/2001 and 31/06/2001. Person 3 worked at Company A between 01/01/2005 and is still at Company A. The results for Person 1 should include Person 2 but not Person 3.

This same logic needs to be applied to all levels of the graph (we allow the user to display 3 levels of connections) and relates to the parent node in each level, i.e. when displaying level 2, the dates for Person 2 and Person 3 should be used to filter their respective relationships.

Essentially, we are trying to do something similar to the LinkedIn connections but to filter based on people working at companies at the same time.

I have tried using the REDUCE function but cannot get the logic to work for the optional end date - can someone please advise how to filter the relationships based on the start and end dates?

Mind explaining what you mean by multiple levels when you are searching for people in the same company with overlapping time. On linkedIn, Person1 is related to Person2 and Person2 is related to Person3, so if you are looking at other company, it might tell you that Person3 is a level 2 connection. What you are trying to achieve is unclear without examples — Himanshu Jain

InverseFalcon InverseFalcon · Accepted Answer · 2018-10-05T23:48:27

It turns out there are 4 ways in which date ranges can overlap, but only 2 in which they do not (person 1 ends before person 2 starts, or person 2 ends before person 1 starts), so it is much simpler to check that neither of these no-overlap conditions exist.

In the level 1 case, this query should do the trick:

MATCH (start:Person{id:1})-[r1:LINK]->(c)<-[r2:LINK]-(suggest)
WHERE NOT ((r1.endDate IS NOT NULL and r1.endDate < r2.startDate) 
        OR (r2.endDate IS NOT NULL and r2.endDate < r1.startDate))
RETURN suggest

The tricky part is applying this to multiple levels.

While we could create a single Cypher query to handle this dynamically, the evaluation of the relationships would only happen after expansion, not during, so it may not be the most efficient:

MATCH path = (start:Person{id:1})-[:LINK*..6]-(suggest:Person)
WITH path, start, suggest, apoc.coll.pairsMin(relationships(path)) as pairs
WITH path, start, suggest, [index in range(0, size(pairs)-1) WHERE index % 2 = 0 | pairs[index]] as pairs
WHERE none(pair in pairs WHERE (pair[0].endDate IS NOT NULL AND pair[0].endDate < pair[1].startDate) 
                          OR (pair[1].endDate IS NOT NULL AND pair[1].endDate < pair[0].startDate))
RETURN suggest

Some of the highlights here...

We're using apoc.coll.pairsMin() from APOC Procedures to get pairs of adjacent relationships from the collection of relationships in each path, but we're only interested in the even-numbered entries (the two relationships from people working at the same company), because the odd-numbered pairs correspond to relationships from the same person going to two different companies.

So if we were executing on this pattern:

MATCH path = (start:Person)-[r1:LINK]->(c1)<-[r2:LINK]-(person2)-[r3:LINK]->(c2)<-[r4:LINK]-(person3)

The apoc.coll.pairsMin(relationships(path)) would return [[r1, r2], [r2,r3], [r3,r4]], and as you can see the relationships we need to consider are the ones linking 2 people to a company, so indexes 0 and 2 in the pairs list.

After we get our pairs we need to ensure that all of those interesting relationship pairs in the path considered to a suggestion meet your criteria and overlap (or do not NOT overlap).

Filter Relationships in Neo4j Using Start/End Dates

2 Answers