I've got a graph consisting of nodes representing document versions connected together in path of versions. These paths can be connected by another type of relationships that represent a change in the way documents were versionned. One of the problem of the graph is that the sources used to create it where not really clean, that's why I'm trying to write a query that would add a relationship to have a clean path of versions in the graph.
The part I'm stuck on is the following : Say that I've got two paths of nodes from two different versioning periods. These paths are connected together by one or multiple relationship from the second type that indicate a port of the document to the new system. I want a query that will take the last one satisfying some conditions in the old path and connect it to the first one satisfyng some other conditions in the new path.
For example in the following graph i would want to connect (D)
to (2)
because (1)
does not satisfy my set of conditions:
(A)-[:Version]->(B)-[:Version]->(C)-[:Version]->(D)
| |
Ported Ported
| |
(1)-[:Version]->(2)-[:Version]->(3)
I came up with different queries but all of them fails in some cases :
This one fails because sometimes old documents where ported and split into multiple documents, meaning different path but my query select only one new 'new' node for one 'old' one thus ignoring some paths.
//match all the 'port' and 'ported' relations between old and new versioning system
match (new:Document)-[r:Link]-(old:Document)
where new.num =~'[A-Z].{4}-.*' and old.num =~'[A-Z].{3}-.*' and r.type in ['PORT','PORTED']
//find youngest one satisfying a condition, here a date
optional match(new)<-[:Version*]-(newAncestor:ArticleCode)
where newAncestor.dateBegin >= '2012-01-01'
with old, collect(new) + collect(newAncestor) as potentialNewVersions
unwind potentialNewVersions as potentialNew
with distinct old, potentialNew
order by potentialNew.dateBegin , potentialNew.dateEnd
with distinct old, collect(potentialNew)[0] as youngestNew
//find oldest one satisfying a condition
optional match(old) -[:Version *]->(oldChild:ArticleCode)
where oldChild.dateEnd <= youngestNew.dateBegin
with old, youngestNew, collect(old) + collect(oldChild) as potentialOldVersions
unwind potentialOldVersions as potentialOld
with distinct old, youngestNew, potentialOld
order by potentialOld.dateEnd desc, potentialOld.dateBegin desc
with distinct youngestNew, collect(potentialOld)[0] as oldestOld
merge(youngestNew)<-[:VersionGlobal]-(oldestOld)
The second one is much simpler but select too much nodes for the 'new' ones as multiple version can satisfy the date condition. In addition it could fail if the only 'ported' relationship between the old and new path was on a node before the limit date.
//this time I match all path of new versions whose first node satisfy condition
match p=(new:Document)-[:Version*0..]->(:Document)-[r:Link]-(old:ArticleCode)
where new.num =~'[A-Z].{4}-.*' and old.num =~'[A-Z].{3}-.*' and r.type in ['PORT','PORTED'] and new.dateBegin >= '2012-01-01'
//take first node of each path
with distinct nodes(p)[0] as youngestNew, old
//find latest old node
optional match p=(old)-[:Version*0..]->(oldChild:ArticleCode)
where oldChild.dateFin <= youngestNew.dateDebut
with distinct last(nodes(p)) as oldestOld, old
merge(youngestNew)<-[:VersionGlobal]-(oldestOld)
Thanks