0
votes

Let say I have two queries to run:

# Q1
Chair.query(ndb.AND(Chair.type == 'A', Chair.invented_at < '2014'))

# Q2
Chair.query(ndb.AND(Chair.type == 'A', Chair.cost == 2, Chair.invented_at < '2014'))

Q1 and Q2 have an inequality filter, therefore those queries require composite indexes.

The index automatically generated by the development server would contain two indexes, one for each query.

# Index for Q1
- kind: Chair
  properties:
  - name: type
  - name: invented_at

# Index for Q2
- kind: Chair
  properties:
  - name: type
  - name: cost
  - name: invented_at

But wouldn't it be more efficient in terms of storage and write operations to only use the second Index (Q2) and modify Q1 (it might be possible in some cases) so it uses Q2 index too ? Or is it easier for the datastore to use smaller but more indexes ?

This question is completely hypothetical, I am just wondering how the datastore would react.

1

1 Answers

0
votes

The datastore never computes anything, it literally just reads in the order presented in the index.

The presence of cost in the middle of the index for Q2 means that Q1 can't use it. It's that simple. It's ordered by type, then cost, then invented_at. I know it would be possible to have the system read just what it needs, but it won't skip over a column for that.

There is the zigzag merge join algorithm that helps minimize the number of your indexes if you want, but not in this situation. You can also have more info here