I work on a project that requires a lot of information to be played into Neo4J via py2neo rapidly, and I've hit a snag in attempting to use Python's multiprocessing library to expedite the process.
In general terms, what we're doing is twofold, where both parts are happening in unison:
Part 1:
- Create a relationship between a pair of nodes (A->B)
- Create another node, which has one relationship to each of the previous pair (C->A, C->B)
- Push all 3 nodes and their relationships into Neo4j,
- Push the ID of node C to a kafka topic
Part 2:
- Create N "threads" via python's multiprocessing, all of which open their own kafka consumer
- consumers read the topic, and execute code that may update relationships/nodes or create additional ones.
The issue that's arising is happening in part 2, specifically when updating or creating relationships or nodes. py2neo is throwing a py2neo.database.TransientError
, which according to the documentation is:
The database cannot service the request right now, retrying later might yield a successful outcome.
I've tried adding a safety check that will re-try the transaction X times, and re-push it to the kafka topic for another thread to try after X failed attempts, but the end results are still incorrect.
I'm woefully ignorant when it comes to multithreading/multiprocessing, so I feel as though my initial research efforts have been severely off-base, and I've wasted a lot of cycles trying to remedy this -- any direction or insight is greatly appreciated.
Can provide additional info if necessary -- not sure what I might be missing.
Thanks