While working on a rdbms data lineage project, I am faced with a Neo4J Cypher challenge that I cannot solve:
This is the example:
- A graph consist of nodes that represent database tables (label:TABLE) or views (label:VIEW)
- Tables, or other views source to views via the directional -[:SOURCES]-> relation
- A subset of the graph has 10 TABLE nodes that optionally can be FILTERED (excluded from the graph). They are named A1 to A10 via the name property. These nodes have the additional label FILTERNODE
- These 10 nodes link to VIEW node B via the SOURCES relation, example: A1-[:SOURCES]->B
- Node B in its turn sources VIEW node C: B-[:SOURCES]->C
- Node C has an array property for filtering: C.filter=['A2','A3']
- Node C in its turn sources VIEW node D: B-[:SOURCES]->C
Now I want to query the chain from the A* nodes to D, not showing all the A* nodes, but ONLY those that are filtered by filter property on node C.
I guess this needs to be done in multiple steps:
- First select the whole subgraph, containing A1..A10,B,C,D
- Then uniquely collect all filtering properties from this subgraph
- Use the collected filtering properties on nodes with the FILTERNODE property to only retain A2,A3,B,C and D
How do I accomplish this in Cypher?
MATCH (c{id:"start"}), (a) WHERE a.name in c.filters
, with maybeAND (a)-[:SOURCES*]->(c)
to double check they are actually connected (with an index on a.name of course). And maybe an additionalOPTIONAL MATCH (a)-[:SOURCES*]->(b)-[:SOURCES*]->(c)
to pick up the in-between. For the bonus challenge, is that saying C can filter upstream too? The other way I read this is you want to collect all the filters in the graph, and then return all the nodes in the filters, and their in-between. – Tezra