3
votes

My last question was closed for being a duplicate Confused about MERGE sometimes creating duplicate relationship, however I was unable to find a solution, and this deals with duplicate relationships, not duplicate nodes.

I have a query when a user VISITED another user's profile

    MATCH (you:User {user_id: { myId }}), (youVisited:User {user_id: { id }})
    MERGE (you)-[yvr:VISITED]->(youVisited)
    SET yvr.seen = false, yvr.created_at = timestamp()
    RETURN yvr.created_at as visited_at

I noticed that in rare cases, a duplicate [:VISITED] relationship happens. For (1057)-[:VISITED]->(630), both have the same properties, and there's really only supposed to be one [:VISITED] no matter what (the next time the user visits, it should simply MERGE the [:VISITED] and update the [:VISITED {created_at: ..., seen: false}] between the same User nodes:

{
    created_at: 1485800172734,
    seen: false
}

enter image description here

I thought the point of MERGE to prevent this? Clearly not, so why does this happen and how can I ensure this doesn't happen?

I have looked up some other things, but I am not sure if the information is reliable or up to date. For example: http://neo4j.com/docs/developer-manual/current/cypher/clauses/create-unique/, am I supposed to be using CREATE UNIQUE instead? I thought MERGE was pretty much a better replacement for it.

2
Which version of Neo4j are you using, by the way? There was an issue about incorrect MERGE locking fixed on the 3.0 branch, I'm trying to verify if this fix made its way into the current releases, or if it's still in the pipeline. - InverseFalcon
If it's not fixed yet on the current branches, you may need to manually lock on the nodes in question before you MERGE. APOC's locking procedures may help you out here. - InverseFalcon
Confirmed as of 2/10/2017 this fix is not yet part of the current releases. - InverseFalcon
@InverseFalcon Thanks for your input (you answer all my Neo4j questions :P). I am running Neo4j Community Edition 3.1.0 using bolt driver github.com/neo4j/neo4j-javascript-driver. - atkayla
A bit late and may not be relevant to your use case / implementation (?), but the issue of duplication relations arose in my SO question on a different matter; refer to the discussion, there, on how we solved it. stackoverflow.com/questions/49682338/… - Victoria Stuart

2 Answers

2
votes

I agree that in some cases, MERGE and CREATE UNIQUE can be used for the same purpose. MERGE does not replace CREATE UNIQUE, however.

For example, MERGE allows multiple matches, and its pattern has to fully match the graph to be considered a match - it will simply duplicate partial matches; CREATE UNIQUE, on the other hand, will error on multiple matches, and allows partial matches - it will attempt to re-use existing parts of your graph and add the missing parts.

As mentioned in the docs, there also seems to be a difference regarding uniqueness of relationships, i.e. what you are experiencing:

MERGE might be what you want to use instead of CREATE UNIQUE. Note however, that MERGE doesn’t give as strong guarantees for relationships being unique.

I'll leave it up to the developers of Neo4j to explain exactly what those guarantees are. I can only say that in your particular case, CREATE UNIQUE seems a better fit than MERGE anyway: if your intent is to only ever allow a single VISITED relationship from one user to another - his last visit - and multiple VISITED relationships are a violation of your data model, then by all means use CREATE UNIQUE to document this intent, and enforce it at the database level at the same time.

In this case, one could argue that the VISITED relationship is also not particularly well-named, since it implies that there could be more: one for each time that a user visited another user's profile.

1
votes

As mentioned in my comments, there was a locking bug with MERGE upon Neo4j switching to the COST planner.

As far as I can tell it works like this:

Due to the bug, double-checked locking wasn't occurring, so after MERGE determines the relationship doesn't exist, it locks on the nodes in preparation to CREATE the relationship, but there's a race condition between the time of the existence check of the relationship, and the locking, so a concurrent MERGE or CREATE could have created the relationship just before the locks were acquired, resulting in duplicate relationships being created.

The fix will ensure MERGE checks for the existence of the relationship again after the locks are acquired. This should restore concurrency guarantees for MERGE.

This fix is not yet in current Neo4j releases as of 2/10/2017.

In the meantime, you can explicitly lock on the nodes in question before you MERGE to prevent the race condition.

You can do this by setting/removing nonexistent values on the nodes in question, or use APOC locking procedures.