2
votes

So I've read all the RMDB vs BigTable debates

I tried to model a simple game class using BigTable concepts.

Goals : Provide very fast reads and considerably easy writes

Scenario: I have 500,000 user entities in my User model. My user sees a user statistics at the top of his/her game page (think of a status bar like in Mafia Wars), so everywhere he/she goes in the game, the stats get refreshed.

Since it gets called so frequently, why don't I model my User around that fact?

Code:

# simple User class for a game
class User(db.Model):
  username = db.StringProperty()

  total_attack = db.IntegerProperty()

  unit_1_amount = db.IntegerProperty()
  unit_1_attack = db.IntegerProperty(default=10)

  unit_2_amount = db.IntegerProperty()
  unit_2_attack = db.IntegerProperty(default=20)

  unit_3_amount = db.IntegerProperty()
  unit_3_attack = db.IntegerProperty(default=50)

  def calculate_total_attack(self):
    self.total_attack = self.unit_1_attack * self.unit_1_amount + \
                        self.unit_2_attack * self.unit_2_amount + \
                        self.unit_3_attack * self.unit_3_amount + \

here's how I'm approaching it ( feel free to comment/critique)

Advantages:
1. Everything is in one big table
2. No need to use ReferenceProperty, no MANY-TO-MANY relationships
3. Updates are very easily done : Just get the user entity by keyname
4. It's easy to transfer queried entity to the templating engine.

Disadvantages:
1. If I have 100 different units with different capabilities (attack,defense,dexterity,magic,etc), then i'll have a very HUGE table.
2. If I have to change a value of a certain attack unit, then I'm going to have to go through all 500,000 user entities to change every one of them. ( maybe a cron job/task queue will help)

Each entity will have a size of 5-10 kb ( btw how do I check how large is an entity once I've uploaded them to the production server? ).

So I'm counting on the fact that disk space at App Engine is cheap, and I need to minimize the amount of datastore API calls. And I'll try to memcache the entity for a period of time.

In essence, everything here goes against RMDB

Would love to hear your thoughts/ideas/experiences.

3

3 Answers

1
votes

First a simple answer to "how do I know how big an entity is?": Once you've got some data in your app on the app engine servers, you can go to your app's console and click the 'Datastore statistics' link. That will give you some basic stats on your entities, like how much space each Kind is using, what property types are using the most disk space, etc. I don't think you can drill down to the level of one particular User however.

Now here are some thoughts on your design. It is worth it to create a separate table for your Units. Even if you end up with a few hundred units, it will be easy to keep them all in memcache, so looking up the details of each unit will be negligible. It will cost you a few extra API calls to initially populate memcache with a unit's info the first time it is used, but after that you will be saving a good amount of CPU cycles by not having to fetch the details of each unit from the database,and saving huge amounts of API calls when you need to update a unit (which you have already realized will be very expensive) In addition, each User object will use less disk space if it only needs a reference to a Unit entity rather than holding all the details itself. (Of course this depends on the amount of info you need to store about each unit, but you did mention that eventually you will be storing lots of stats for each unit)

If you do have a separate table for Units, it will also allow you to keep your User object more flexible. Instead of needing a specific field for each unit, you could just have a list of refernces to units. That way, if you add a unit type, you would not have to modify your User kind.

1
votes

You should create independent models for your units. "While a single entity or entity group has a limit on how quickly it can be updated, App Engine excels at handling many parallel requests distributed across distinct entities, and we can take advantage of this by using sharding." Have a look at this article. It may be useful.

0
votes

based on Peter's thoughts, I came up with the following revised User model. What do you people think?

class Unit(db.Model):
  name = db.StringProperty()
  attack = db.IntegerProperty()

#initialize 4 different types of units
Unit(key_name="infantry",name="Infantry",attack=10).put()
Unit(key_name="rocketmen",name="Rocketmen",attack=20).put()
Unit(key_name="grenadiers",name="Grenadiers",attack=30).put()
Unit(key_name="engineers",name="Engineers",attack=40).put()

class User(db.Model):
  username = db.StringProperty()

  # eg: [10,50,100,200] -> this represents 10 infantry, 50 rocketmen, 100 grenadiers and 200 engineers
  unit_list_count = db.ListProperty(item_type=int)

  # this holds the list of key names of each unit type: ["infantry","rocketmen","grenadiers","engineers"]
  unit_list_type = db.StringListProperty()

  # total attack is not calculated inside the model. Instead, I will use a
  # controller file ( a py file ) to call the contents of unit_list_count and 
  # unit_list_type of a certain user entity, and make simple multiplications and additions to get total attack

and yes, all the unit_types will be memcached so they can be retrieved for the fast calculation of total attack points.

Would like to hear everyone's thoughts on this.