I'm trying to build a Pyramid application. I started with the SQLAlchemy scaffolding. I'm running into a issue and I'm wondering what the best way is to address it. In one of my views I need to select a lot of rows from two unrelated tables. I need to make sure no rows were inserted into the second table between the time I selected the rows from the first table and the time I selected the rows from the second table.
I have three models, Node, Test, and Tasking. Both Nodes and Tests have quite a bit of metadata. Given a list of Nodes and a list of Tests, a global list of Taskings can be created. For example, we could have three Nodes, a, b, and c and two Tests "we need one node to do task P" and "we need two nodes to do task Q".
From that information, three Tasks should be created. For example:
- "Node
ashould do taskP" - "Node
bshould do taskQ" - "Node
cshould do taskQ"
Now, I'm attempting to provide a REST API for this. The vast majority of time clients will be requesting the list of Tasks, so that needs to be fast. However, sometimes a client might add a Node or a Test. When that happens, I need the entire list of Tasks to be regenerated.
Here's a rough example:
@view_config(route_name='list_taskings')
def list_taskings(request):
return DBSession.Query(Tasking).all()
@view_config(route_name='add_node')
def add_node(request):
DBSession.add(Node())
_update_taskings()
@view_config(route_name='add_test')
def add_test(request):
DBSession.add(Test())
_update_taskings()
def _update_taskings():
nodes = DBSession.query(Node).all()
tests = DBSession.query(Test).all()
# Process...
Tasking.query.delete()
for t in taskings:
DBSession.add(t)
I'm using the default Pyramid SQLAlchemy scaffolding. So, each request auto starts a transaction. So, if _update_tasking is called from one request (say add_node), then the new node will be added to the local DBSession, and querying for all Nodes and Tests in _update_tasking will return that new element. In addition, deleting all existing Taskings and adding the newly computed ones is also safe.
I have two problems:
What happens if a new row is added into the
Teststable between the time I get my list ofnodesand my list oftestsin_update_taskings? In my real world production system these selects are close together but not right next to each other. There is the possibility of a race condition.How do I ensure two requests that will update the
Taskingsdon't overwrite each other? For example, imagine if our existing system had oneNodeand oneTest. Two requests come in at the same, one to add aNodeand one to add aTest. Even if problem #1 wasn't an issue and I knew that each request's pair of selects represented "a single instance of time in the database", there's still the problem of one request overriding the other. If the first request finishes first with now twoNodesand oneTest, the second request will still be selecting the old data (potentially) and will generate a list ofTaskingswith oneNodeand twoTests.
So, what's the best way to deal with this? I'm using SQLite for development and PostgreSQL in production, but I'd like a database agnostic solution. I'm not worried about other applications accessing this database. My REST API will be the only access mechanism. Should I put a lock around any of requests that mutate the database (adding a Node or a Test)? Should I lock the database somehow?
Thanks for any help!