Incorrect results when matching multiple labels with Cypher

Question

I have a graph that has nodes that represent roles in a permission hierarchy. The permission hierarchy looks like this:

(:role:owners)-[:CONTACT]->(:role:admins)-[:CONTACT]->(:role:employees)
-[:CONTACT]->(:role:contacts)

Contacts are then attached to each of the role nodes like this:

(:contact {id: "1"})-[:CONTACT]->(:role:owners)
(:contact {id: "2"})-[:CONTACT]->(:role:employees)

I'm trying to run the following query that returns the :contacts role if the user is an owner or an admin (such as contact 1) and the :employees role if the user is an employee (such as contact 2).

MATCH (c:contact {id: "1"})
WITH c

MATCH (g:role)
WHERE
  (c)-[:CONTACT*1..2]->(:admins)-->(:employees)-->(g:contacts) OR
  (c)-[:CONTACT]->(g:employees)
RETURN DISTINCT c, g

Expected result

c                          g
(:contact {id: "1"})       (:role:contacts)
Returned 1 row

Actual result

c                          g
(:contact {id: "1"})       (:role:owners)
(:contact {id: "1"})       (:role:admins)
(:contact {id: "1"})       (:role:employees)
(:contact {id: "1"})       (:role:contacts)
Returned 4 rows

I thought this query would only return nodes with both the :role label and either :contacts or :employees but instead it also returns nodes with :owners and :admins labels.

Why is it returning these extra nodes and how can I prevent it from doing so? This is for Neo4j 2.1.2.

Additional Information

I think I've found the bug. Looks like the label check is being dropped if a valid path is not found. In the following example, note that there is no path between (c1) and (g:projectcontacts) since there are no (:projectcontacts) nodes. In this case, the projectcontacts match is dropped and all nodes that match (c1)-[:CONTACT*1..3]->(g:role) are returned instead.

This is on a Neo4j 2.1.3 server:

neo4j-sh (?)$ CREATE
>   (o:role:owners {name:"Owners"})-[:CONTACT]->(:role:admins {name:"Admins"})-[:CONTACT]->
>   (e:role:employees {name:"Employees"})-[:CONTACT]->(:role:contacts {name:"Contacts"})
> CREATE
>   (c1:contact {id: "test1"})-[:CONTACT]->(o)
> CREATE
>   (c2:contact {id: "test2"})-[:CONTACT]->(e)
> 
> WITH c1
> 
> MATCH (g:role)
> WHERE
>   (c1)-[:CONTACT*1..2]->(:admins)-->(:employees)-->(g:contacts) OR
>   (c1)-[:CONTACT]->(g:employees) OR
>   (c1)-[:CONTACT*1..3]->(g:projectcontacts)
> RETURN DISTINCT c1, g;
+-------------------------------------------------------+
| c1                     | g                            |
+-------------------------------------------------------+
| Node[4439]{id:"test1"} | Node[4438]{name:"Contacts"}  |
| Node[4439]{id:"test1"} | Node[4436]{name:"Admins"}    |
| Node[4439]{id:"test1"} | Node[4437]{name:"Employees"} |
| Node[4439]{id:"test1"} | Node[4435]{name:"Owners"}    |
+-------------------------------------------------------+
4 rows
Nodes created: 6
Relationships created: 5
Properties set: 6
Labels added: 10
50 ms

If we drop the non-existant node label from our match query, we get the expected results.

neo4j-sh (?)$ CREATE
>   (o:role:owners {name:"Owners"})-[:CONTACT]->(:role:admins {name:"Admins"})-[:CONTACT]->
>   (e:role:employees {name:"Employees"})-[:CONTACT]->(:role:contacts {name:"Contacts"})
> CREATE
>   (c1:contact {id: "test1"})-[:CONTACT]->(o)
> CREATE
>   (c2:contact {id: "test2"})-[:CONTACT]->(e)
> 
> WITH c1
> 
> MATCH (g:role)
> WHERE
>   (c1)-[:CONTACT*1..2]->(:admins)-->(:employees)-->(g:contacts) OR
>   (c1)-[:CONTACT]->(g:employees)
> RETURN DISTINCT c1, g;
+------------------------------------------------------+
| c1                     | g                           |
+------------------------------------------------------+
| Node[4445]{id:"test1"} | Node[4444]{name:"Contacts"} |
+------------------------------------------------------+
1 row
Nodes created: 6
Relationships created: 5
Properties set: 6
Labels added: 10
55 ms

can you give us some sample output? you mean you're getting more than expected in g, right? — Eve Freeman
Correct. Instead of just returning the :role:contacts node it also returns the intermediate :role:owners, :role:admins, and :role:employees nodes. I updated the question with results. — Bill
Looks like a bug to me, somehow the pattern isn't also filtering on the label. Does it fix it if you add an additional constraint, like WHERE (g:contacts) OR (g:employees) WITH g before that? Any chance of getting some sample data in console or something? — Eve Freeman
I've updated the question. I think I found the bug and it still exists in 2.1.3. — Bill

Jim Biard Jim Biard · Accepted Answer · 2014-08-21T15:41:20

Bill,

I don't know this for sure, but I would tend to guess that the problem is that you are causing g to be redefined by adding labels to it in your WHERE clause. Having a label applied may cause g to be viewed as a pattern to be matched.

Whether that is right or wrong, here is a way to do the query that doesn't run afoul of the problem (now corrected to do what you are wanting to do):

MATCH (c1:contact {id : 'test'})
WITH c1
MATCH (c1)-[:CONTACT]->(f:role)-[:CONTACT*1..]->(g:contacts)
WITH c1, CASE WHEN 'employees' IN labels(f) THEN f ELSE g END AS h
RETURN DISTINCT c1, h

Grace and peace,

Jim

Incorrect results when matching multiple labels with Cypher

1 Answers