2
votes

Explanation about the functionality in question:

Postgres (Index Combination), MySQL (Index Merge) & MongoDB (Index Intersection) have a functionality, where the DB uses multiple single-column indices (indexes) when there is no multi-column index found for a given query that has multiple columns in the where clause. Here is what Postgres' documentation speaks about this feature - https://www.postgresql.org/docs/8.3/indexes-bitmap-scans.html

An excerpt from the link

Beginning in release 8.1, PostgreSQL has the ability to combine multiple indexes (including multiple uses of the same index) to handle cases that cannot be implemented by single index scans. The system can form AND and OR conditions across several index scans. For example, a query like WHERE x = 42 OR x = 47 OR x = 53 OR x = 99 could be broken down into four separate scans of an index on x, each scan using one of the query clauses. The results of these scans are then ORed together to produce the result. Another example is that if we have separate indexes on x and y, one possible implementation of a query like WHERE x = 5 AND y = 6 is to use each index with the appropriate query clause and then AND together the index results to identify the result rows.

My use case:

I would like to build a UI where a user can search (filter) for an entity using multiple fields (30+ fields currently and growing) in the table. The number of filtered entities will need to be shown in the UI & refreshed on each update the user makes to the filter. So, it is implicit that it needs to be fast (preferably < 1s). Creating multiple column indices for all possible combinations is not feasible and might be inefficent even if it is done.

Following is what I have observed by running a few queries.

Case 1:
select count(*) from TableName@{FORCE_INDEX=_BASE_TABLE} where stringColumn = 'str1';
Table Scan: TableName (full scan: true)    ~11.72s


Case 2:
select count(*) from TableName where stringColumn = 'str1';
Index Scan: IndexForStringColumn    1.55s


Case 3:
select count(*) from TableName where ts > '2019-01-01';
Index Scan: IndexForTS    450902    1    985.66 ms


Case 4:
select count(*) from TableName where stringColumn = 'str1' and ts > '2019-01-01';
Index Scan: IndexForTS    450903    1    1.07 s
  • Case 1 to 3. As expected. Case 1 did not use any index and hence a TableScan for 11.72s.
  • Case 4 is the anomaly. It says it only used IndexForTS. But the runtime seems to be far lower (1.07s). Looks like this used IndexForStringColumn as well.

Questions:

  1. Does Google Cloud Spanner support this functionality of using multiple single column indices for a single query ? When I tried to run some benchmarks in the Cloud Spanner, it looks like it is supported but there is no official documentation about this.
  2. If it is not supported, is the any other way of building this functionality with Google Cloud spanner ?

Thanks in advance for the help!

1

1 Answers

3
votes
  1. Unfortunately, index intersection and union are in the backlog. Cloud spanner will choose an index if applicable but the scope is limited to a single index. If you have a big conjunct, the most selective single column index will be used.

  2. You can always craft index intersection and union by rewriting your SQL statement. For example,

SELECT * FROM A WHERE x = 1 AND y = 1;

can be rewritten as

SELECT * FROM A WHERE key IN ((SELECT key FROM A WHERE x = 1) INTERSECT (SELECT key FROM A FROM y = 1));

Similarly,

SELECT * FROM A WHERE x = 1 OR y = 1;

can be rewritten as

SELECT * FROM A WHERE key IN ((SELECT key FROM A WHERE x = 1) UNION (SELECT key FROM A FROM y = 1)); -- You may add ALL if you do not expect large number of rows satisfying either predicate.

Hope this help.