Cypher directionless query not returning all expected paths

Question

I have a cypher query that starts from a machine node, and tries to find nodes related to it using any of the relationship types I've specified:

match p1=(n:machine)-[:REL1|:REL2|:REL3|:PERSONAL_PHONE|:MACHINE|:ADDRESS*]-(n2)
where n.machine="112943691278177215"
optional match p2=(n2)-[*]->()
return p1,p2
limit 300

The optional match clause is my attempt to traverse outwards in my model from each of the nodes found in p1. The below screenshot shows the part of the results I'm having issues with:

You can see from the starting machine node, it finds a personal_phone node via two app nodes related to the machine. For clarification, this part of the model is designed like so:

So it appeared to be working until I realized that certain paths were somehow being left out of the results. If I run a second query showing me all apps related to that particular personal_phone node, I get the following:

match p1=(n:personal_phone)<-[*]-(n2)
where n.personal_phone="(xxx) xxx-xxxx"
return p1
limit 100

The two apps I have segmented out, are the two apps shown in the earlier image.

So why doesn't my original query show the other 7 apps related to the personal_phone?

EDIT : Despite the overly broad optional match combined with the limit 300 statement, the returned results show only 52 nodes and 154 rels. This is because the paths following relationships with an outward direction are going to stop very quickly. I could have put a max 2 on it but was being lazy.

EDIT 2: The query I finally came up with to give me what I want is this:

match p1=(m:machine)<-[:MACHINE]-(a:app)
where m.machine="112943691278177215"
optional match p2=(a:app)-[:REL1|:REL2|:REL3|:PERSONAL_PHONE|:MACHINE|:ADDRESS*0..3]-(n)
where a<>n and a<>m and m<>n
optional match p3=(n)-[r*]->(n2)
where n2<>n
return distinct n, r, n2

This returns 74 nodes and 220 rels which seems to be the correct result (387 rows). So it seems like my incredibly inefficient query was the reason the graph was being truncated. Not only were the nodes being traversed many times, but the paths being returned contained duplicate information which consumed the limited rows available for return. I guess my new questions are:

When following multiple hops, should I always explicitly make sure the same nodes aren't traversed via where clauses?
If I was to return p3 instead, it returns 1941 rows to display 74 nodes and 220 rels. There seems to be a lot of duplication present. Is it typically better to use return distinct (like I have above) or is there a way to easily dedupe the nodes and relationships within a path?

I have a feeling it's related to my starting point having a :machine label but I'm still unable to modify the query to get the expected results. — Robert Penridge
So maybe I don't understand entirely, but when you do optional match p2=(n2)-[*]->() it seems you're asking for your entire connected graph component (any number of hops, to any kind of node). When you then pair that with LIMIT 300, is it possible what you're expecting to see just isn't in the first 300 items? That optional match seems excessively broad, in that it grabs an entire connected component. Your references to your model classes suggest that lack of specificity is odd. — FrobberOfBits
Yes, so it would seem the extra apps should be in the result set, but you're returning/limiting the paths that come back. So what's missing here is how many other results there are in the first 300, and the overall size of your DB — FrobberOfBits
@FrobberOfBits Ah good point. I don't think that's the issue but I'll update the question to explain why. — Robert Penridge
So I just tried limiting it to [*..2], [*..3], and [*..4] respectively. I saw more relationships returned when changing from 2 to 3, but 4 was the same as 3, so they do appear to be terminating as expected. Also, adding a limit of 2000 did not change the result set. — Robert Penridge

FrobberOfBits FrobberOfBits · Accepted Answer · 2015-07-29T15:56:51

So part of your issue here (updated questions) is that you're returning paths, and not individual nodes/relationships.

For example, if you do MATCH p=(n)-[*]-() and your data is A->B->C->D then the results you'll get will be A->B, A->B->C, A->B->C->D and so on. If on the other hand you did MATCH (n)-[r:*]-(m) and then worked with r and m, you could get the same data, but deal with the distinct things on the path rather than have to sort that out later.

It seems you want the nodes and relationships, but you're asking for the paths - so you're getting them. ALL of them. :)

When following multiple hops, should I always explicitly make sure the same nodes aren't traversed via where clauses?

Well, the way you did it, yes -- but honestly I haven't ever run into that problem before. Part of the issue again is the overly-broad query you're running. Lacking any constraint, it ends up roping in the items you've already matched, which buys you this problem. Perhaps better would be to match some set of possible labels, to narrow your query down. By narrowing it down, you wouldn't have the same issue, for example something like:

MATCH (n)-[r:*]-(m)
WHERE 'foo' in labels(m) or 'bar' in labels(m)
RETURN n, r, m;

Note we're not doing path matching, and we're specifying some range of labels that could be m, without leaving it completely wild-west. I tend to formulate queries this way, so your question #2 never really arises. Presumably you have a reasonable data model that would act as your grounding for that.

Cypher directionless query not returning all expected paths

1 Answers