1
votes

Going straight to the point, would it be possible to still keep a normalized, bi-dimensional model within the Google App Engine Datastore, where each relation is a kind in itself, and its entities are instances of the relation?

I already know that the Datastore (with its underlying Bigtable technology) works differently than RDBM systems, but my question is: what prevents one from still laying out their model in a relational way (with all its advantages from a theoretical and planning point of view) within the Datastore?

An example to clarify. Couldn't I still plan entities of the following kinds:

  • Person (Name:str, Company:Company)
  • Company (Name:str)
  • Project (Notes:text)
  • PersonProjects (Person:Person, Project:Project)

The properties which refer to other entities (e.g. Person.Company, PersonProjects.Project) would store those entities' ids. What would the major drawbacks (if any) be, performance-wise? Note that I could have normalized the model further, e.g. introducing new kinds for PersonName, CompanyName etc, but I decided here to keep one-value properties within the same kind they refer to.

I remember watching some time ago a video from the I/O series (made by the same Google) in which normalization techniques were employed to prevent entities of a certain kind from being too large, i.e. having too many properties (the problem actually involved exploding indexes). One property of the planned kind was "detached" from it as a new kind, only to be augmented to it afterwards through code.

Well, couldn't I still do that for all of a kind's properties? I can't see any major issues except for the increase in client-side (or server-side) work (that required to get an object "set up" for retrieval). So, is the switch to an "entity-based" model really necessary? Can't we simulate relations through kinds and entities?

I hope I've been clear enough.

1
You can, but you typically need to make comprises for performance reasons. I use intermediate entities when you need many to many relations.Tim Hoffman
Thanks for the reply. But what are exactly the performance issues you're hinting at? That's basically what I need to know.atava
You will know it when things start taking to long. Remember you can't do joins. So anything you want to retrieve through a query that is dependent on the values at the other end of a relation becomes expensive, if you need a value at 2 levels away then it will cost big. Profile your application and you will get a better idea of what you will need to optimize. At the moments your question is too open ended and the correct answer depends on what you are doing.Tim Hoffman
I do care about final querying capabilities of the app. In a quasi-normalized model as the one assumed here, I would retrieve (or complete) my artificially-constructed objects through multiple micro-queries made on the normalized fields (as you say). Example: for a Person object I would get its Projects querying PersonProjects.Project with .Person = the object's id. I'm sure the Datastore offers some key-based querying facility and I think that wouldn't be expensive since only the basic indexes are involved. So, where is the performance bottleneck? Querying isn't the strength of the Engine?atava
A get by key is fast, the problems arises when you iterate over 100's or more entities from the primary query , that require fetching additional items.Tim Hoffman

1 Answers

1
votes

Nothing prevents you from normalising your model in Datastore. The problem is that Datastore has a very limited query language: inequality filter only on one property, no multi-kind query, no JOINs, etc.. This forces you to organise data depending on your access pattern: access-oriented modelling. This often forces you to store data in illogical places, just to get to it fast (= minimum set of DB operations).

Additionally, transactions are quite limited, forcing you to organise data in certain way (entity groups). Or if you use XG transactions then you will be limited to 25 entities per transaction.

Also note that there is no DB-enforced referential integrity as is usual in SQL RDBMs.