27
votes

I'm using JDO 2.3 on app engine. I was using the Master/Slave datastore for local testing and recently switched over to using the HRD datastore for local testing, and parts of my app are breaking (which is to be expected). One part of the app that's breaking is where it sends a lot of writes quickly - that is because of the 1-second limit thing, it's failing with a concurrent modification exception.

Okay, so that's also to be expected, so I have the browser retry the writes again later when they fail (maybe not the best hack but I'm just trying to get it working quickly).

But a weird thing is happening. Some of the writes which should be succeeding (the ones that DON'T get the concurrent modification exception) are also failing, even though the commit phase completes and the request returns my success code. I can see from the log that the retried requests are working okay, but these other requests that seem to have committed on the first try are, I guess, never "applied." But from what I read about the Apply phase, writing again to that same entity should force the apply... but it doesn't.

Code follows. Some things to note:

  1. I am attempting to use automatic JDO caching. So this is where JDO uses memcache under the covers. This doesn't actually work unless you wrap everything in a transaction.
  2. all the requests are doing is reading a string out of an entity, modifying part of the string, and saving that string back to the entity. If these requests weren't in transactions, you'd of course have the "dirty read" problem. But with transactions, isolation is supposed to be at the level of "serializable" so I don't see what's happening here.
  3. the entity being modified is a root entity (not in a group)
  4. I have cross-group transactions enabled

The relevant code (this is a simplified version):

PersistenceManager pm = PMF.getManager();
Transaction tx = pm.currentTransaction();
String responsetext = "";
try {
    tx.begin();
    // I have extra calls to "makePersistent" because I found that relying
    // on pm.close didn't always write the objects to cache, maybe that
    // was only a DataNucleus 1.x issue though
    Key userkey = obtainUserKeyFromCookie();
    User u = pm.getObjectById(User.class, userkey);
    pm.makePersistent(u); // to make sure it gets cached for next time
    Key mapkey = obtainMapKeyFromQueryString();
    // this is NOT a java.util.Map, just FYI
    Map currentmap = pm.getObjectById(Map.class, mapkey);
    Text mapData = currentmap.getMapData(); // mapData is JSON stored in the entity
    Text newMapData = parseModifyAndReturn(mapData); // transform the map
    currentmap.setMapData(newMapData); // mutate the Map object
    pm.makePersistent(currentmap); // make sure to persist so there is a cache hit
    tx.commit();
    responsetext = "OK";
} catch (JDOCanRetryException jdoe) {
    // log jdoe
    responsetext = "RETRY";
} catch (Exception e) {
    // log e
    responsetext = "ERROR";
} finally {
    if (tx.isActive()) {
        tx.rollback();
    }
    pm.close();
}
resp.getWriter().println(responsetext);

UPDATE: I am pretty sure I know why this is happening, but I will still award the bounty to anyone who can confirm it.

Basically, I think the problem is that transactions are not really implemented in the local version of the datastore. References:

https://groups.google.com/forum/?fromgroups=#!topic/google-appengine-java/gVMS1dFSpcU https://groups.google.com/forum/?fromgroups=#!topic/google-appengine-java/deGasFdIO-M https://groups.google.com/forum/?hl=en&fromgroups=#!msg/google-appengine-java/4YuNb6TVD6I/gSttMmHYwo0J

Because transactions are not implemented, rollback is essentially a no-op. Therefore, I get a dirty read when two transactions try to modify the record at the same time. In other words, A reads the data and B reads the data at the same time. A attempts to modify the data, and B attempts to modify a different part of the data. A writes to the datastore, then B writes, obliterating A's changes. Then B is "rolled back" by app engine, but since rollbacks are a no-op when running on the local datastore, B's changes stay, and A's do not. Meanwhile, since B is the thread that threw the exception, the client retries B, but does not retry A (since A was supposedly the transaction that succeeded).

1
Have you thought of redesigning your datastore and how you use it, in order to avoid persisting to the same entity group more than once per second? Alternatively, have you tried handing off persisting to the datastore to enqueued tasks, and arranging things to respect the 1/s entity group write frequency limit?Ian Marshall
I have thought of that. But before I do that, I'd like to understand why this particular bug is happening... my concern is that I fundamentally don't understand something about the HRD or app engine/jdo transactions or something, or that I missed something in the documentation, and it's going to bite me later, because I have at least 25 other services that I need to add transactions to (JDO caching won't work if datastore accesses are not in a transaction)eeeeaaii
FWIW, using current plugin (GAE JDO v2.x), I see no requirement for access to be in a transaction for the L2 cache to work; if an object is read in then it is L2 cached and if it isn't then it ought to be reported (obviously the old plugin isn't supported so only report such a thing if with the current).DataNucleus
@DataNucleus upgraded to new plugin, got same behavior. What I don't understand is that the code does two datastore reads, then a write. When caching is enabled, you would think the two reads would come from cache, so the only thing that would go to the ds would be the write. But that's not what happens. Instead, the only billed transaction is one datastore read, and no datastore writes occur. Why?eeeeaaii
@DataNucleus: and just to clarify, when the two reads and the single write are not in a transaction, then there's no caching: all three of these operations go to the datastore.eeeeaaii

1 Answers

1
votes

Maybe bad news for you, I left JDO and I'm using Objectify and at some places directly datanucleus. I have a perfect control over my persistence which is a performance and design better choice (if you think in the long term).

Because the db is no-sql, there are structural changes against JPA, JDO and standard assumptions:

Using the native datanucleus API you can do things that are not in standard JPA nor even in Objectify : The example I used was to create columns dynamically

The transaction is not present in GAE, there is something that can sometimes look like a transaction (entity groups). So using the native API will avoid you doing such imprevisible gymnastics.

Trying to drive a car with a joystick could work, but there are surely new things to learn. In my opinion it is worth learning the native way