21
votes

TL;DR: I am either losing my mind, or neo4j's transactions are slightly broken. It looks like uncommitted nodes are available outside of committed transactions, with missing properties - or something equally weird.

Our node.js app uses neo4j. A part of it has to generate unique IDs. We have the following cypher query that is meant to locate the last :Id-type node and attempt to commit a new :Id node with last_uuid+1.

MATCH (i:Id) WITH i ORDER BY i.uuid DESC LIMIT 1  #with it like a sub-return, will "run" the rest with the last i at read-time
CREATE (n:Id {label:"Test"}) 
SET n.uuid = i.uuid + 1
RETURN n

There is also a constraint:

neo4j-sh (?)$ schema
Indexes
  ON :Id(uuid) ONLINE (for uniqueness constraint) 

Constraints
  ON (id:Id) ASSERT id.uuid IS UNIQUE

And the DB is initialized with an (:Id{uuid:1}) to kick-start this joy.

The app code basically retries the above query until it is successful. If two or more Id creation requests hit at the same time, only one of them will go through, the rest of them would fail and be retried by the app code.

This worked, until we tried it in parallel.

The code started returning data without a uuid. After a lot of investigation, it turns out that the write part of the query (CREATE...) is somehow receiving an :Id from MATCH which has no .uuid (or other) properties. This shouldn't be possible. This is the only code that operates on those nodes.

The strangest (perhaps) thing, is that if I save i's nodeid in order to locate that node in the DB, it actually exists and has a .uuid property.

In order to isolate this behaviour I wrote a PoC: neo4j-transaction-test It should be really simple to run with nodejs.

It is basically a tad bit more than the above code - tries to create the Id, setting prev_label, prev_nodeid, and prev_uuid to the previous Node's (i) values. It runs the query for every GET request it receives on localhost:9339 and outputs:

 > node server.js 
 * 1412125626667 Listening on 9339
 Req Id | Datetime | -> $uuid $nodeid
 1 1412125631677 'GET'       # When it first receives the GET request
 1 1412125631710 '->' 9 60   # When neo4j returns; numbers are $uuid $node_id)

when things start getting concurrent, queries may fail:

3 1412125777096 '(retry) (0)' 'Node 64 already exists with label Id and property "uuid"=[13]'
4 1412125777098 '(retry) (0)' 'Node 64 already exists with label Id and property "uuid"=[13]'
de[]

which is to be expected, and they are retried. If we "slam" the server with a couple of reqs per second (ab -n 1000 -c 10 http://localhost:9339/), we will eventually see:

...
59 1412127103011 'GET'
23 1412127103024 'ERROR - EMPTY UUID' '{"this_nodeid":22,"prev_nodeid":20,"label":"Test"}'

Error: Empty UUID received

(and by eventually, I mean almost instantly) A node comes back, without uuid, prev_uuid or prev_label. this_nodeid and prev_nodeid refer to neo4j's internal id. If we look these up, starting with the previous (i) Id node (by nodeid - 20):

neo4j-sh (?)$ match (i) where id(i)=20 return i;
+--------------------------------------------------------------------------------------------+
| i                                                                                          |
+--------------------------------------------------------------------------------------------+
| Node[20]{uuid:10,label:"Test",prev_label:"Test",prev_uuid:9,prev_nodeid:17,this_nodeid:20} |
+--------------------------------------------------------------------------------------------+
1 row
19 ms

It is exactly as it should be. .uuid and all. The new one is indeed created just as it is returned above:

neo4j-sh (?)$ match (i) where id(i)=22 return i;
+------------------------------------------------------+
| i                                                    |
+------------------------------------------------------+
| Node[22]{label:"Test",prev_nodeid:20,this_nodeid:22} |
+------------------------------------------------------+
1 row
17 ms

No prev_label or prev_uuid. How is this possible? What am I missing? Is an incomplete :Id node leaking into my query?

I've tried restarting, wiping the data dir, restarting after wiping the data dir, scavenged the logs (nothing interesting, or even boring but at the right time - when the above is happening). I am now at the point where I am questioning my understanding of how this is supposed to work.

This is on 12.04 with neo4j 2.1.1. More Version Info and Neo4j startup/shutdown logs.

I am aware that this is not an optimal way to create UUIDs. This question is about understanding how these results are possible if neo4j's transactions work as expected.

1
I understand that. If we stick with neo4j (not very likely) then the ID generation code will be moved off of the DB. I still want to know how I'm seeing this result though - it indicates something broken with cypher transactions.Tasos Bitsios
Instead of Create have you used Merge (superior locking)? Could you also take the Neo4J Node.js code out and try hitting the transactional endpoint directly to rule that out (docs.neo4j.org/chunked/stable/rest-api-transactional.html) as I think that particular (unofficial) Node library hits the legacy endpoint (Your Neo is newish, your Node is not). You can also then see the raw response.JohnMark13
@TasosBitsios was there any resolution to this?JohnMark13
Sorry about the delay @JohnMark13, real life got in the way. There hasn't been any resolution, I think I'll open a GH issue with the neo4j team. RE: MERGE, I haven't tried it, and even if it worked as expected I would still want to know what is happening here. RE: versions/API, AFAIK there is no non-transactional API. The legacy endpoint will wrap the query in a transaction, and the newer one will allow a transaction spanning across queries. This doesn't really change the fact that this result doesn't make sense.Tasos Bitsios
Did you ever post an issue? I'm running into this or something extremely similar now, too. I opened one at github.com/neo4j/neo4j/issues/3864subvertallchris

1 Answers

1
votes

We noticed a similar issue in Neo4J during concurrent transactional writes and there was a fix introduced in Neo4J 2.2.5 to address this (we have Enterprise support and raised a ticket to get this fixed). You may not have exactly the same problem, but it might be worth trying again using 2.2.5 to see if it's still an issue.

Like you said, there are better ways to generate ids. Also, you should use MAX to get the latest instead of LIMIT and ORDER BY, but that's besides the point also ;-).

Good luck.