Why do my AppEngine queries throw DatastoreNeedIndexExceptions although the required indices are 'serving'?

Question

I am running a multi-tenant java high-replication web application on Google AppEngine. The application successfully uses multi-property indices (configured within the datastore-indexes.xml file). Well, at least up until now...

Since today there is at least one namespace that throws DatastoreNeedIndexExceptions when executing a query. The curious thing is that the same query works in other namespaces.

Here is the index configuration from the datastore-indexes.xml and the index status from the admin panel:

<?xml version="1.0" encoding="utf-8"?>
<datastore-indexes autoGenerate="false">
    <datastore-index kind="Index_Asc_Asc_Asc_Asc" ancestor="false" source="manual">
        <property name="components" direction="asc"/>
        <property name="component_0" direction="asc"/>
        <property name="component_1" direction="asc"/>
        <property name="component_2" direction="asc"/>
        <property name="component_3" direction="asc"/>
    </datastore-index>
</datastore-indexes>

Index state within the admin panel

The corresponding query looks like this:

SELECT __key__ FROM Index_Asc_Asc_Asc_Asc WHERE components = '12340987hja' AND component_0 = 'asdfeawsefad' AND component_1 = '4FnlnSYiJuo25mNU' AND component_3 = 'cvxyvsdfsa' AND component_2 >= 0

When I execute this query within my application or the admin panel datastore view App Engine throws a DatastoreNeedIndexException with the following recommendation. Again, the same query works in other namespaces:

The suggested index for this query is:
<datastore-index kind="Index_Asc_Asc_Asc_Asc" ancestor="false">
  <property name="component_0" direction="asc" />
  <property name="component_1" direction="asc" />
  <property name="component_3" direction="asc" />
  <property name="components" direction="asc" />
  <property name="component_2" direction="asc" />
</datastore-index>

Same error within the admin panel

Investigations:

I have tried to set autoGenerate="true", but I do get the same error and no new indexes have been added.
I have tried to execute the query in newly created namespaces: No problems.
The error does not occur in the development server.

Is there something I am missing? Has anyone else experienced the same problem? Why is the same query working in other namespaces but not in that one?

Thanksalot!

The index suggested in the error is different to what you say you have defined. Have you manually added the suggested index definition ? — Tim Hoffman
Hello Tim, thank you for your comment! In which way is it different? Does property order matter? Is there some documentation to that? Thank you! — Holger
Have a look at the index specs line by line. They must match exactly. Order is important. Just cut and paste the index specification and deploy. Its all in the docs. — Tim Hoffman
Tim, you're right. Order is important. I've not yet found the corresponding info in the docs, but I just added the recommended index for testing purposes. It works. Unfortunately in my setup, blindly adding that index is not possible. However, thanks for your help! — Holger
Actually the docs do allude to it. The index is a combination of queries and sort orders. If you read how the indexing work, it's clear that each member of the index and the order has to match the query you are using. I am curious why you can't add an index spec that matches a query you wish to perform. As you have found out, without adding the index the query won't work. — Tim Hoffman

Jason Tholstrup Jason Tholstrup · Accepted Answer · 2013-09-14T04:59:39

Tim is right. To help clarify the point you need to understand how datastore works.

Basically all datastore reads need to be sequential in the index you are looking at. In other words they need to be in adjacent rows. This is how datastore gains speed and how it can be sharded across multiple machines. (there are some exceptions for equality matching but just accept that smart people figured that one out for us for now).

So looking at a set of data with a num column, and alpha column and an id column like the following:

    id   Num    Alpha
---------------------
     1     1        A
     2     1        Z
     3     4        A
   ...   ...      ...  <-- lots of data
100004     2        Z
100005     1        C

So when datastore comes through a query like yours it will look at the precomputed index and find the starting point of matches. It will then read until the rows no longer match the query. It never does a join like you are used to in SQL. The closes thing is a zipper merge which only applied to equality operators. ROWS MUST BE ADJACENT IN THE INDEX!

So index num asc, alpha asc looks like:

    id   Num    Alpha
---------------------
   ...   ...      ...  <- negative numbers
     1     1        A
100005     1        C
     2     1        Z
100004     2        Z
     3     4        A
   ...   ...      ...  <-- lots of data (assume all other num values were above 5)

and index alpha asc, num asc looks like:

    id   Num    Alpha
---------------------
     1     A        1
     3     A        4
   ...   ...      ...  <-- lots of data
100005     C        1
   ...   ...      ...  <-- lots of data
     2     Z        1
100004     Z        2
   ...   ...      ...  <-- lots of data

This allows datastore to quickly zip through your data to get an answer very fast. It can then use the id to look up the rest of that row's data.

If for example you tried to look at all of the num=1 and wanted all alpha's ordered sequentially it would have to read all of the num=1 rows into memory (which could be 100s of millions of rows) then sort them based on A. Here it's all precomputed and much faster. This allows for far more throughput on reads. It's probably overkill for your application but the idea is that your app can scale to huge sizes this way.

Hope that made sense.

Why do my AppEngine queries throw DatastoreNeedIndexExceptions although the required indices are 'serving'?

1 Answers