1
votes

I am using google app engine for my backend and datastore as the DB. The link https://cloud.google.com/datastore/docs/concepts/limits indicates that the maximum number of composite indexes for a project cannot be more than 200. I have around 130 composite indexes in my project and would hit the limit sometime in the future.

Limit of 200 seems very less to me. Let's say I have 5 modules in my project and each module has 10 "kinds" each. In each of the kinds, I have 4 properties I want to index upon (let's call them prop1, prop2, prop3 and prop4). Also each of the "kinds" have a field called creationTime, which stores the time at which the entity was created in the datastore. Whether I apply 0, 1 ,2 ,3 or all 4 of the filters, I always want my list of entities to be sorted by creationTime with newest first.

This seems to me to be a perfectly reasonable scenario. In this case for each "kind", I would have to define the following composite indexes

<datastore-index kind="kind1" ancestor="false">
        <property name="prop1" direction="asc" />
        <property name="creationTime" direction="desc" />
</datastore-index>
<datastore-index kind="kind1" ancestor="false">
        <property name="prop2" direction="asc" />
        <property name="creationTime" direction="desc" />
</datastore-index>
<datastore-index kind="kind1" ancestor="false">
        <property name="prop3" direction="asc" />
        <property name="creationTime" direction="desc" />
</datastore-index>
<datastore-index kind="kind1" ancestor="false">
        <property name="prop4" direction="asc" />
        <property name="creationTime" direction="desc" />
</datastore-index>

Since there are 50 such kinds, there would be 200 such indexes. Now I know I can avoid these indexes if I don't sort the list of entities by creationTime, but I think that would be really bad from the user perspective.

So is there any way to increase / overcome the limit? Am I missing something here? Do I need to limit my queries ? If yes then how can I achieve the same user experience? Is datastore not meant for such queries? What options do I have here?

2

2 Answers

1
votes

You cannot increase the limit, so you should look at your data model instead.

First, let's clear up terminology: What you are calling 'entities' are really called 'kinds'. Entities are the individual records in a kind.

Review your kinds and see if they are really semantically different, or if they are actually very similar (many overlapping properties). If they are similar, you can have them all in the same kind and add a property to distinguish between them; let's call it the type property.

For example, rather than having separate kinds for trolls, zombies, and witches, you could have a single kind called monsters.

Now, your example index:

<datastore-index kind="Entity1" ancestor="false">
        <property name="prop1" direction="asc" />
        <property name="creationTime" direction="desc" />
</datastore-index>

Would be the following:

<datastore-index kind="Master" ancestor="false">
        <property name="type" direction="Entity1" />
        <property name="prop1" direction="asc" />
        <property name="creationTime" direction="desc" />
</datastore-index>

What is good about this, is filter prop1 and sort by creationTime only takes one composite index, regardless of the number of types. So in your example of 50 kinds, rather than 50 composite indexes to cover each kind, you now only have 1.

0
votes

The only option I see to overcome such limit is to spread your modules into multiple apps, even one module per app if needed, basically by following the Project isolation GAE microservice architectural approach:

If you don't want to rely on these patterns to achieve isolation and you want a more formal enforcement of separation, you can use multiple App Engine projects. There are pros and cons to using projects instead of services, and you must balance the tradeoffs depending on your situation. Unless you have a specific need for one of the advantages offered by using multiple projects, it's best to start with using multiple services within a single project because performance will be better and the administrative overhead will be minimized. Of course, you can also choose some hybrid of the two approaches.

The max indexes limit is one of those multiple projects advantages, overall you'd multiply the limit by the number of projects.

Right below that section there is a comparison with the Service isolation architecture you're currently using.

But this approach only helps if each of your modules uses less indexes than the limit. If any one of them needs more you'll have to re-design it.

UPDATE:

Another possible approach is to optimize your index usage, in some cases it's possible to handle multiple different queries with:

However, there are situations where it's impossible to know the exact form of a query ahead of time, like when the filters of the query are constructed dynamically based on user input. In these cases, all possible combinations of filters must be supported by the indexes defined by the application. Traditionally, this has required a combinatorial explosion in the number of indexes defined. Recent enhancements to the App Engine Datastore query planner have eliminated the requirement for such a proliferation in an application's indexes. This article describes how to take full advantage of these enhancements.

...

The total number of indexes is 2^(number of filters) * (number of different orders) = 2 ^ 5 * 4 = 128 indexes

It is possible to specify this many indexes, but doing so has risks:

  • Potential to exceed the index cap (200)
  • Greatly increased storage cost per entity (as this cost includes the size of the index entries)

...

The number of index entries needed is (number of filters + 1) * (number of orders) = 7 * 4 = 28. This is a much more manageable number. Additionally, none of these indexes are exploding, so the additional storage cost of entities is similarly small.