0
votes

I am reading about GAE and its datastore. I came across this question and article. So I wonder if my users can be identified, say, by email, would it be reasonable to use the same parent for all users and email as a key with the goal of resolving conflicts when two different users are trying to use the same email as their identifiers? In theory if number of users becomes large (like, say, 10M), may it cause any issues? From my perspective, gets should be just fine but puts are those that are locked. So if gets significantly dominate puts (which happen really only at the point of creating a new user), I don't see any issues. But....

Key parent = KeyFactory.createKey("parent", "users");
Key user = KeyFactory.createKey(parent, "user", "[email protected]");

When to use entity groups in GAE's Datastore https://developers.google.com/appengine/articles/scaling/contention

2

2 Answers

0
votes

I also faced the unique email issue and here's what I've done:

Setup a "kind" called "Email" and use the user inputted email as string key. This is the only way you can make a field scale-able and unique in datastore. Then setup another kind called "User" and have the Key using auto generated Id:

Email

key: email, UserKey: datastore.Key

User

key: auto_id, Password: string, Name: string

In this setup, the email can be used as login, and user have the option to change their email as well (or have multiple emails), while email remains unique system-wide.)

====================

It's not scale-able if you put every user under the same parent. You will end up with all data stuck on one particular "server" because entities from the same entity group are stored in close proximity. You will end up facing the 5 writes per second issue

=====================

As a general rule of thumb, things that scales (e.g. user), must be a root entity to enjoy the benefit of data-store scale-ability.

0
votes

I think I have found the answer to my question. https://developers.google.com/appengine/articles/handling_datastore_errors in Causes of Errors section:

The first type of timeout occurs when you attempt to write to a single entity group too quickly. Writes to a single entity group are serialized by the App Engine datastore, and thus there's a limit on how quickly you can update one entity group. In general, this works out to somewhere between 1 and 5 updates per second; a good guideline is that you should consider rearchitecting if you expect an entity group to have to sustain more than one update per second for an extended period. Recall that an entity group is a set of entities with the same ancestor—thus, an entity with no children is its own entity group, and this limitation applies to writes to individual entities, too. For details on how to avoid datastore contention, see Avoiding datastore contention. Timeout errors that occur during a transaction will be raised as a appengine.ext.db.TransactionFailedError instead of a Timeout.