1
votes

While working on a rdbms data lineage project, I am faced with a Neo4J Cypher challenge that I cannot solve:

This is the example:

  1. A graph consist of nodes that represent database tables (label:TABLE) or views (label:VIEW)
  2. Tables, or other views source to views via the directional -[:SOURCES]-> relation
  3. A subset of the graph has 10 TABLE nodes that optionally can be FILTERED (excluded from the graph). They are named A1 to A10 via the name property. These nodes have the additional label FILTERNODE
  4. These 10 nodes link to VIEW node B via the SOURCES relation, example: A1-[:SOURCES]->B
  5. Node B in its turn sources VIEW node C: B-[:SOURCES]->C
  6. Node C has an array property for filtering: C.filter=['A2','A3']
  7. Node C in its turn sources VIEW node D: B-[:SOURCES]->C

Now I want to query the chain from the A* nodes to D, not showing all the A* nodes, but ONLY those that are filtered by filter property on node C.

I guess this needs to be done in multiple steps:

  1. First select the whole subgraph, containing A1..A10,B,C,D
  2. Then uniquely collect all filtering properties from this subgraph
  3. Use the collected filtering properties on nodes with the FILTERNODE property to only retain A2,A3,B,C and D

How do I accomplish this in Cypher?

2
What are your inputs for this query? Are you starting with something that identifies node C (or D?) Or do you mean to query a node that has no outgoing :SOURCES relationship, find its filter, and match the path back to those sources? Or is it a different approach?InverseFalcon
I traverse de graph 'back' to the input sources, meaning: start from node C, to node B (that sources towards c), to nodes A1..10 that each source towards B. But instead of showing the A1 to A10 nodes I want to only show A2 and A3 (because of the filter property on node C)Rogier Werschkull
Okay, sounds good. What labels are you using here? Is there a common set of labels for the source nodes (your A nodes)?InverseFalcon
Yes, the A nodes all have the :TABLE label. The values in the C.filter property would be the unique keys of those nodes (their name property)Rogier Werschkull
So, I'm still a little confused about what you want. It sounds like you want something like MATCH (c{id:"start"}), (a) WHERE a.name in c.filters, with maybe AND (a)-[:SOURCES*]->(c) to double check they are actually connected (with an index on a.name of course). And maybe an additional OPTIONAL MATCH (a)-[:SOURCES*]->(b)-[:SOURCES*]->(c) to pick up the in-between. For the bonus challenge, is that saying C can filter upstream too? The other way I read this is you want to collect all the filters in the graph, and then return all the nodes in the filters, and their in-between.Tezra

2 Answers

1
votes

Here is the Cypher reference card; But basically you want to collect multiple arrays from properties, combine them to be one list, and than use that as a filter. The main things you need to combine everything into one list is EXTRACT+REDUCE, with a filter to protect yourself from null values (property isn't set).

Here is a Cypher showing how to do extract+reduce as one clean step, with comments explaining each step of the Cypher.

// Match our starting point, and collect all child views (0.. collects itself too)
MATCH (:TABLE)-[:SOURCES]->(start:View)-[:SOURCES*0..25]->(view:View)
// Make sure we only have one copy of each view
WITH COLLECT(DISTINCT view) as views
// Collect all filters (if they exist) into one list
WITH views, reduce(s = [], v IN [x IN views WHERE EXISTS(x.filter) | x.filter]| s + v.filter) as filters
// Match all tables in filter list
MATCH (a:TABLE)
WHERE a.name IN filters
// Format return; Everything in one column
WITH views+COLLECT(a) as ns
UNWIND ns as n
RETURN n
0
votes

So you're starting from a specific node C, need to traverse back and get the :TABLE nodes that act as the sources, but only those filtered by the list property on C.

This should work:

MATCH (c)
WHERE id(c) = 123 // standin for however you match to your starting node
WITH c, c.filter as allowed
MATCH (a:TABLE)-[:SOURCES*]->(c)
WHERE a.name in allowed
RETURN a

If you have lots of input nodes, and :TABLE(name) is unique, then we can modify this to pre-match on the possible input nodes, letting us do node comparisons rather than property comparisons:

MATCH (c)
WHERE id(c) = 123 // standin for however you match to your starting node
MATCH (a:TABLE)
WHERE a.name in c.filter
WITH c, collect(a) as allowed
MATCH (a:TABLE)-[:SOURCES*]->(c)
WHERE a in allowed
RETURN a