8
votes

I have a graph database that maps out connections between buildings and bus stations, where the graph contains other connecting pieces like roads and intersections (among many node types).

What I'm trying to figure out is how to filter a path down to only return specific node types. I have two related questions that I'm currently struggling with.

Question 1: How do I return the labels of nodes along a path?

It seems like a logical first step is to determine what type of nodes occur along the path.

I have tried the following:

MATCH p=(a:Building)­-[:CONNECTED_TO*..5]­-(b:Bus) 
WITH nodes(p) AS nodes 
RETURN DISTINCT labels(nodes);

However, I'm getting a type exception error that labels() expects data of type node and not Collection. I'd like to dynamically know what types of nodes are on my paths so that I can eventually filter my paths.

Question 2: How can I return a subset of the nodes in a path that match a label I identified in the first step?

Say I found that that between (a:Building) and (d1:Bus) and (d2:Bus) I can expect to find (:Intersection) nodes and (:Street) nodes.

This is a simplified model of my graph:

(a:Building)­­--(:Street)­--­(:Street)--­­(b1:Bus) 
             \­­(:Street)--­­(:Intersection)­­--(:Street)--­­(b2:Bus)

I've written a MATCH statement that would look for all possible paths between (:Building) and (:Bus) nodes. What would I need to do next to filter to selectively return the Street nodes?

MATCH p=(a:Building)-[r:CONNECTED_TO*]-(b:Bus)
  // Insert logic to only return (:Street) nodes from p

Any guidance on this would be greatly appreciated!

2

2 Answers

6
votes
  1. To get the distinct labels along matching paths:

    MATCH p=(a:Building)-[:CONNECTED_TO*..5]-(b:Bus)
    WITH NODES(p) AS nodes
    UNWIND nodes AS n
    WITH LABELS(n) AS ls
    UNWIND ls AS label
    RETURN DISTINCT label;
    
  2. To return the nodes that have the Street label.

    MATCH p=(a:Building)-[r:CONNECTED_TO*]-(b:Bus)
    WITH NODES(p) AS nodes
    UNWIND nodes AS n
    WITH n
    WHERE 'Street' IN LABELS(n)
    RETURN n;
    
2
votes

Cybersam's answers are good, but their output is simply a column of labels...you lose the path information completely. So if there are multiple paths from a :Building to a :Bus, the first query will only output all labels in all nodes in all patterns, and you can't tell how many paths exist, and since you lose path information, you cannot tell what labels are in some paths but not others, or common between some paths.

Likewise, the second query loses path information, so if there are multiple paths using different streets to get from a :Building to a :Bus, cybersam's query will return all streets involved in all paths. It is possible for it to output all streets in your graph, which doesn't seem very useful.

You need queries that preserve path information.

For 1, finding the distinct labels on nodes on each path I would offer this query:

MATCH p=(:Building)-[:CONNECTED_TO*..5]-(:Bus)
WITH NODES(p) AS nodes
WITH REDUCE(myLabels = [], node in nodes | myLabels + labels(node)) as myLabels
RETURN DISTINCT myLabels

For 2, this query preserves path information:

MATCH p=(:Building)-[:CONNECTED_TO*..5]-(:Bus)
WITH NODES(p) AS nodes
WITH FILTER(node in nodes WHERE (node:Street)) as pathStreets
RETURN pathStreets

Note that these are both expensive operations, as they perform a cartesian product of all buildings and all busses, as in the queries in your description. I highly recommend narrowing down the buildings and busses you're matching upon, hopefully to very few or specific buildings at least.

I also encourage limiting how deep you're looking in your pattern. I get the idea that many, if not most, of your nodes in your graph are connected by :CONNECTED_TO relationships, and if we don't cap that to a reasonable amount, your query could be finding every single path through your entire graph, no matter how long or convoluted or nonsensical, and I don't think that's what you want.