How would I achieve this using Google App Engine Datastore?

Question

I am a beginner to Datastore and I am wondering how I should use it to achieve what I want to do.

For example, my app needs to keep track of customers and all their purchases.

Coming from relational database, I can achieve this by creating [Customers] and [Purchases] table. In Datastore, I can make [Customers] and [Purchases] kinds.

Where I am struggling is the structure of the [Purchases] kind.

If I make [Purchases] as the child of [Customers] kind, would there be one entity in [Customers] and one entity in [Purchases] that share the same key? Does this mean inside of this [Purchases] entity, I would have a property that just keeps increasing for each purchase they make?

Or would I have one [Purchases] entity for each purchase they make and in each of these entities I would have a property that points to a entity in [Customers] kind?

How does Datastore perform in these scenarios?

dragonx dragonx · Accepted Answer · 2013-11-30T15:57:34

Sounds like you don't fully understand ancestors. Let's go with the non-ancestor version first, which is a legitimate way to go:

class Customer(ndb.Model):
    # customer data fields
    name = ndb.StringProperty()

class Purchase(ndb.Model):
    customer = ndb.KeyProperty(kind=Customer)
    # purchase data fields
    price = ndb.IntegerProperty

This is the basic way to go. You'll have one entity in the datastore for each customer. You'll have one entity in the datastore for each purchase, with a keyproperty that points to the customer.

IF you have a purchase, and need to find the associated customer, it's right there.

purchase_entity.customer.get()

If you have a Customer, you can issue a query to find all the purchases that belong to the customer:

Purchase.query(customer=customer_entity.key).fetch()

In this case, whenever you write either a customer or purchase entity, the GAE datastore will write that entity any one of the datastore machines running in the cloud that's not busy. You can have really high write throughput this way. However, when you query for all the purchases of a given customer, you just read back the most current data in the indexes. If a new purchase was added, but the indexes not updated yet, then you may get stale data (eventual consistency). You're stuck with this behavior unless you use ancestors.

Now as for the ancestor version. The basic concept is essentially the same. You still have a customer entity, and separate entities for each purchase. The purchase is NOT part of the customer entity. However, when you create a purchase using a customer as an ancestor, it (roughly) means that the purchase is stored on the same machine in the datastore that the customer entity was stored on. In this case, your write performance is limited to the performance of that one machine, and is advertised as one write per second. As a benefit though, you can can query that machine using an ancestor query and get an up-to-date list of all the purchases of a given customer.

The syntax for using ancestors is a bit different. The customer part is the same. However, when you create purchases, you'd create it as:

purchase1 = Purchase(ancestor=customer_entity.key)
purchase2 = Purchase(ancestor=customer_entity.key)

This example creates two separate purchase entities. Each purchase will have a different key, and the customer has its own key as well. However, each purchase key will have the customer_entity's key embedded in it. So you can think of the purchase key being twice as long. However, you don't need to keep a separate KeyProperty() for the customer anymore, since you can find it in the purchases key.

class Purchase(ndb.Model):
    # you don't need a KeyProperty for the customer anymore
    # purchase data fields
    price = ndb.IntegerProperty

purchase.key.parent().get()

And in order to query for all the purchases of a given customer:

Purchase.query(ancestor=customer_entity.key).fetch()

The actual of structure of the entities don't change much, mostly the syntax. But the ancestor queries are fully consistent.

The third option that you kinda describe is not recommended. I'm just including it for completeness. It's a bit confusing, and would go something like this:

class Purchase(ndb.Model):
    # purchase data fields
    price = ndb.IntegerProperty()

class Customer(ndb.Model):
    purchases = ndb.StructuredProperty(Purchase, repeated=True)

This is a special case which uses ndb.StructuredProperty. In this case, you will only have a single Customer entity in the datastore. While there's a class for purchases, your purchases won't get stored as separate entities - they'll just be stored as data within the Customer entity.

There may be a couple of reasons to do this. You're only dealing with one entity, so your data fetch will be fully-consistent. You also have reduced write costs when you have to update a bunch of purchases, since you're only writing a single entity. And you can still query on the properties of the Purchase class. However, this was designed for only having a limited number or repeated objects, not hundreds or thousands. And each entity is limited to ta total size of 1MB, so you'll eventually hit that and you won't be able to add more purchases.

How would I achieve this using Google App Engine Datastore?

3 Answers