0
votes

I'm new(ish) to Neo4j and I'm attempting to build a tool that allows users on a UI to essentially specify a path of nodes they would like to query neo4j for. For each node in the path they can specify specific properties of the node and generally they don't care about the relationship types/properties. The relationships need to be variable in length because the typical use case for them is they have a start node and they want to know if it reaches some end node without caring about (all of) the intermediate nodes between the start and end.

Some restrictions the user has when building the path from the UI is that it can't have cycles, it can't have nodes who has more than one child with children and nodes can't have more than one incoming edge. This is only enforced from their perspective, not in the query itself.

The issue I'm having is being able to specify filtering on each level of the path without getting strange behavior.

I've tried a lot of variations of my Cypher query such as breaking up the path into multiple MATCH statements, tinkering with the relationships and anything else I could think of.

Here is a Gist of a sample Cypher dump cypher-dump

This query gives me the path that I'm trying to get however it doesn't specify name or type on n_four.

    MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
    WHERE n_one.type IN ["JCL_JOB"]
    AND n_two.type IN ["JCL_PROC"]
    AND n_three.name IN ["INPA", "OUTA", "PRGA"]
    AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
    RETURN path

CorrectGraph

This query is what I'd like to work however it leaves out the leafs at the third level which I am having trouble understanding.

    MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
    WHERE n_one.type IN ["JCL_JOB"]
    AND n_two.type IN ["JCL_PROC"]
    AND n_three.name IN ["INPA", "OUTA", "PRGA"]
    AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
    AND n_four.name IN ["TAB1", "TAB2", "COPYA"]
    AND n_four.type IN ["RESOURCE_TABLE", "COBOL_COPYBOOK"]
    RETURN path

IncorrectGraph

What I've noticed is that when I "... RETURN n_four" in my query it is including nodes that are at the third level as well.

1
Your queries do not seem to have anything to do with the stated restrictions. And in general it is not clear what you are trying to do.cybersam
@cybersam The restrictions are on the user side building a diagram to represent the call chain relationship they'd like to see, not with the query itself. What I am trying to do is generate a query that is arbitrary in length of parent -> ... -> child nodes. Where at each child level I can specify filters on properties of the nodes.ZachSand
Can you provide the Cypher queries to create the data to demonstrate the issue?cybersam
I updated the post with a Gist link to a cypher dump of a sample database that demonstrates the issue I'm having with the query.ZachSand

1 Answers

2
votes

This behavior is caused by your (probably inappropriate) use of [*0..] in your MATCH pattern.

FYI:

  • [*0..] matches 0 or more relationships. For instance, (a)-[*0..]->(b) would succeed even if a and b are the same node (and there is no relationship from that node back to itself).

  • The default lower bound is 1. So [*] is equivalent to [*..] and [*1..].

Your 2 queries use the same MATCH pattern, ending in ...->(n_three)-[*0..]->(n_four).

  • Your first query does not specify any WHERE tests for n_four, so the query is free to return paths in which n_three and n_four are the same node. This lack of specificity is why the query is able to return 2 extra nodes.

  • Your second query specifies WHERE tests for n_four that make it impossible for n_three and n_four to be the same node. The query is now more picky, and so those 2 extra nodes are no longer returned.

You should not use [*0..] unless you are sure you want to optionally match 0 relationships. It can also add unnecessary overhead. And, as you now know, it also makes the query a bit trickier to understand.