0
votes

In Google App Engine Datastore HRD in Java,

We can't do joins and query multiple table using Query object or GQL directly

I just want to know that my idea is correct approach or not

If We build Index in Hierarchical Order Like Parent - Child - Grand child by node

Node - Key - IndexedProperty - Set

In case if we want to collect all the sub child's & grand child's. We can collect all the keys which are matching within the hierarchy filter condition and provide the result of keys

and In Memcache we can hold each key and pointing to DB entity, if the cache does not have also in a single query using set of keys we can get all the records from DB.

Pros

1) Fast retrieval - Google recommends using get entities by keys.

2) Single Transaction is enough to collect multiple table data.

3) Memcache and Persistent Datastore will represent the same form.

4) It will scan only the related data to the group like user or parent node.

Cons

1) Meta Data of the DB size will increase so the DB size increase.

2) If the Index of the Single Parent is going to take more than 1MB then we have to split and Save as blob in the DB.

This structure is good approach or not.

In case If we have long deeper levels in the hierarchy, this will solve running lot of query operation to collect all the items dependent to parents.

In case of multiple parents - Collect all the Indexes and Get the Keys related to the Query. Collect all the data in single transactions using list of keys.

If any one found some more Pros or Cons Please add them and justify this approach will correct or not.

Many thanks

Krishnan

1

1 Answers

2
votes

There are quite a few things going on here that are important to think about:

Datastore is not a relational database. You definitely should not be approaching your data storage from a tables and join perspective. It will lead to a messy and most likely inefficient setup.

It seems like you are trying to restructure your use of Datastore to provide complete transactional and strongly consistent use of your data. The reason Datastore cannot provide this natively is that it is too inefficient to provide these guarantees along with high availability.

With the Datastore, you want to be able to provide the ability to support many (thousands, hundreds of thousands, millions, etc) writes per second to different entities. The reason that the Datastore provides the notion of an entity group is that it allows the developer to specify a specific scope of consistency.

Consider an example todo tracking service. You might define a User and a Todo kind. You wouldn't want to provide strong consistency for all Todos, since every time a user adds a new note, the underlying system would have to ensure that it was put transactionally with all other users writing notes. On the other hand, using entity groups, you can say that a single User represents your unit of consistency. This means that when a user writes a new note, this has to be updated transactionally with any other modification to that user's notes. This is a much better unit of consistency since as your service scales to more users, they won't conflict with each other.

You are talking about creating and managing your own indexes. You almost certainly don't want to do this from an efficiency point of view. Further, you'd have to be very careful since it seems you would have a huge number of writes to a single entity / range of entities which represent your table. This is a known Datastore anti-pattern.

One of the hard parts about the Datastore is that each project may have very different requirements and thus data layout. There is definitely not one size fits all for how to structure your data, but here are some resources: