0
votes

This question refers to database design using app engine and objectify. I want to discuss pros and cons of the approach of placing all (or let's say multiple) entities into a single "table".

Let's say I have a (very simplified) data model of two entities:

class User {
    @Index Long userId;
    String name;
}

class Message {
    @Index Long messageId;
    String message;
    private Ref<User> recipient;
}

At first glance, it makes no sense to put these into the same "table" as they are completely different.

But let's look at what happens when I want to search across all entities. Let's say I want to find and return users and messages, which satisfy some search criteria. In traditional database design I would either do two separate search requests, or else create a separate index "table" during writes where I repeat fields redundantly so that I can later retrieve items in a single search request.

Now let's look at the following design. Assume I would use a single entity, which stores everything. The datastore would then look like this:

Type | userId | messageId | Name | Message

USER | 123456 | empty | Jeff | empty
MESSAGE | empty | 789012 | Mark | This is text.

See where I want to go? I could now search for a Name and would find all Users AND Messages in a single request. I would even be able to add an index field, something like

@Index List index;

to the "common" entity and would not need to write data twice.

Given the behavior of the datastore that it never returns a record when searching for an indexed field which is empty, and combining this with partial indexes, I could also get the User OR Message by querying fields unique to a given Type.

The cost for storing long (non-normalized) records is not higher than storing individual records, as long as many fields are empty.

I see further advantages:

  • I could use the same "table" for auditing as well, as every record stored would form a "history" entry (as long as I don't allow updates, in which case I would need to handle this manually).
  • I can easily add new Types without extending the db schema.
  • When search results are returned over REST, I can return them in a single List, and the client looks at the Type.

There might be disadvantages as well, for example with caching, but maybe not. I can't see this at this point.

Anybody there, who has tried going down this route or who can see serious drawbacks to this approach?

2

2 Answers

2
votes

This is actually how the google datastore works under the covers. All of your entities (and everyone else's entities) are stored in a single BigTable that looks roughly like this:

{yourappid}/{key}/{serialized blob of your entity data}

Indexes are stored in three BigTables shared across all applications. I try to explain this in a fair amount of detail in my answer to this question: efficient searching using appengine datastore ancestor paths

So to rephrase your question, is it better to have Google maintain the Kind or to maintain it yourself in your own property?

The short answer is that having Google maintain the Kind makes it harder to query across all Kinds but makes it easier to query within one Kind. Maintaining the pseudo-kind yourself makes it easier to query across all Kinds but makes it harder to query within one Kind.

When Google maintains the Kind as per normal use, you already understand the limitation - there is no way to filter on a property across all different kinds. On the other hand, using a single Kind with your own descriminator means you must add an extra filter() clause every time you query:

 ofy().load().type(Anything.class).filter("discriminator", "User").filter("name >", "j")

Sometimes these multiple-filter queries can be satisfied with zigzag merges, but some can't. And even the ones that can be satisfied with zigzag aren't as efficient. In fact, this tickles the specific degenerative case of zigzags - low-cardinality properties like the discriminator.

Your best bet is to pick and choose your shared Kinds carefully. Objectify makes this easy for you with polymorphism: https://code.google.com/p/objectify-appengine/wiki/Entities#Polymorphism A polymorphic type hierarchy shares a single Kind (the kind of the base @Entity); Objectify manages the discriminator property for you and ensures queries like ofy().load().type(Subclass.class) are converted to the correct filter operation under the covers.

I recommend using this feature sparingly.

1
votes

One SERIOUS drawback to that will be indexes:

every query you do will write a separate index to be servable, then ALL writes you do will need to write to ALL these tables (for NO reason, in a good amount of cases).

I can't think of other drawbacks at the moment, except the limit of a meg per entity (if you have a LOT of types, with a LOT of values, you might run into this as you end up having a gazillion columns)

Not mentioning how big your ONE entity model would be, and how possibly convoluted your code to "triage" your entity types could end up being