0
votes

I have built a database to store data on moving vehicles that have been divided into stop and trip events - so two node types are Stop and Trip. Stop nodes have attributes including the duration of the stop and its location and are connected to the next stop by that vehicle by -[:NEXT_STOP]->. Naturally there is only one relationship of this type per stop node. A stop by definition is 1 hour long and a trip by definition is what is between stops. I am trying to build a query that can aggregate stops (and subsequently the intermediate trips) into larger units. That is a user may wish to define a "tour" that incorporates several trips and stops that begin and end at specified locations and are bookended by stops of longer duration, say 8 hours.

I can describe this quite easily in pseudo verbally. Say the user wants all tours (where a tour is bookended by stops of 8 hours) between Sydney and Melbourne. Then the process would finding all Stop nodes where location is Sydney and duration is over 8 hours. Then, for each of these, checking the NEXT_STOP Stop node. If that Stop is less than 8 hours in duration, then checking its next node in turn until we reach a node with a duration of over 8 hours. It would then filter these terminal nodes to retrieve only those in Melbourne. Ideally, there would be edge conditions defined to stop endless loops, for instance stopping once they reach stops that are a certain period removed from the start.

I have been unable to implement this in Cypher however. The recursive aspects are straight forward, but I cannot work out how to stipulate the conditions for intermediate nodes (ie, duration less than 8 hours) or make the path found non-greedy. For instance, if I start with something like

WITH 4*3600 as maxdur 
MATCH (s1:Stop{location:'Sydney'})-[:NEXT_STOP *1..]->(s2:Stop) 
WHERE s1.duration > maxdur and s2.duration > maxdur

It matches the longest possible path - for instance several tours joined together because it is not checking where intermediate stops are sufficiently short.

The shortestPath algorithms are no help here because they will match the shortest path of all starting nodes subject to the conditions, rather than the shortest path for each of the starting nodes.

I have a way of doing this off-server, especially if I predetermine the desired tour parameters, but this seems like a problem naturally suited to graph databases.

1

1 Answers

0
votes

This snippet may work for you:

WITH 4*3600 as maxdur
MATCH p=(s1:Stop{location:'Sydney'})-[:NEXT_STOP*]->(s2:Stop)
WHERE
  s1.duration > maxdur AND s2.duration > maxdur AND
  NONE(n IN NODES(p)[1..-1] WHERE n.duration > maxdur)

NODES(p)[1..-1] is used to get the interior nodes in each candidate path. The NONE function is used to filter out paths in which an interior node has a duration exceeding maxdur.