0
votes

I've got a graph consisting of nodes representing document versions connected together in path of versions. These paths can be connected by another type of relationships that represent a change in the way documents were versionned. One of the problem of the graph is that the sources used to create it where not really clean, that's why I'm trying to write a query that would add a relationship to have a clean path of versions in the graph.

The part I'm stuck on is the following : Say that I've got two paths of nodes from two different versioning periods. These paths are connected together by one or multiple relationship from the second type that indicate a port of the document to the new system. I want a query that will take the last one satisfying some conditions in the old path and connect it to the first one satisfyng some other conditions in the new path.

For example in the following graph i would want to connect (D) to (2) because (1) does not satisfy my set of conditions:

(A)-[:Version]->(B)-[:Version]->(C)-[:Version]->(D)
                 |               |
               Ported          Ported
                 |               |
(1)-[:Version]->(2)-[:Version]->(3)

I came up with different queries but all of them fails in some cases :

This one fails because sometimes old documents where ported and split into multiple documents, meaning different path but my query select only one new 'new' node for one 'old' one thus ignoring some paths.

//match all the 'port' and 'ported' relations between old and new versioning system
match (new:Document)-[r:Link]-(old:Document) 
   where new.num =~'[A-Z].{4}-.*' and old.num =~'[A-Z].{3}-.*' and r.type in ['PORT','PORTED'] 
//find youngest one satisfying a condition, here a date
optional match(new)<-[:Version*]-(newAncestor:ArticleCode) 
   where newAncestor.dateBegin >= '2012-01-01'
with old, collect(new) + collect(newAncestor) as potentialNewVersions
unwind potentialNewVersions as potentialNew
with distinct old, potentialNew 
order by potentialNew.dateBegin , potentialNew.dateEnd 
with distinct old, collect(potentialNew)[0] as youngestNew

//find oldest one satisfying a condition
optional match(old) -[:Version *]->(oldChild:ArticleCode) 
   where oldChild.dateEnd <= youngestNew.dateBegin
with old, youngestNew, collect(old) + collect(oldChild) as potentialOldVersions
unwind potentialOldVersions as potentialOld
with distinct old, youngestNew, potentialOld 
order by potentialOld.dateEnd desc, potentialOld.dateBegin desc
   with distinct youngestNew, collect(potentialOld)[0] as oldestOld

merge(youngestNew)<-[:VersionGlobal]-(oldestOld)

The second one is much simpler but select too much nodes for the 'new' ones as multiple version can satisfy the date condition. In addition it could fail if the only 'ported' relationship between the old and new path was on a node before the limit date.

//this time I match all path of new versions whose first node satisfy condition
match p=(new:Document)-[:Version*0..]->(:Document)-[r:Link]-(old:ArticleCode) 
   where new.num =~'[A-Z].{4}-.*' and old.num =~'[A-Z].{3}-.*' and r.type in ['PORT','PORTED'] and new.dateBegin >= '2012-01-01' 
//take first node of each path
with distinct nodes(p)[0] as youngestNew, old
//find latest old node
optional match p=(old)-[:Version*0..]->(oldChild:ArticleCode) 
   where oldChild.dateFin <= youngestNew.dateDebut
with distinct last(nodes(p)) as oldestOld, old
merge(youngestNew)<-[:VersionGlobal]-(oldestOld)

Thanks

1

1 Answers

0
votes

I think we found an answer using optional matches and cases :

match (new:Document)-[:Version*0..]-(:Document)-[r:Lien]-(:Document)-[:Version*0..]-(old:Document)
where *myConditions*

optional match (newAncestor:Document)-[:Version]->(new)
with distinct
case  
    when newAncestor.dateBegin < '2012-01-01' or newAncestor is null
    then new
end as youngestNew, old
where not(youngestNew is null)

optional match (old)-[:Version]->(oldChild:Document)
with distinct
youngestNew,
case 
    when oldChild.dateBegin > youngestNew.dateBegin or oldChild is null
    then old
end as oldestOld
where not(oldestOld is null)

*merge part*