We have an HDInsight cluster running HBase (Ambari)
- We have created a table using Pheonix
CREATE TABLE IF NOT EXISTS Results (Col1 VARCHAR(255) NOT NULL,Col1 INTEGER NOT NULL ,Col3 INTEGER NOT NULL,Destination VARCHAR(255) NOT NULL CONSTRAINT pk PRIMARY KEY (Col1,Col2,Col3) ) IMMUTABLE_ROWS=true
We have filled some data into this table (using some java code)
Later, we decided we want to create a local index on the destination column as follows
CREATE LOCAL INDEX DESTINATION_IDX ON RESULTS (destination) ASYNC
- We have run the index tool to fill the index as follows
hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table RESULTS --index-table DESTINATION_IDX --output-path DESTINATION_IDX_HFILES
- When we run queries and filter using the destination columns everything is ok. For example
select /*+ NO_CACHE, SKIP_SCAN */ COL1,COL2,COL3,DESTINATION from Results where COL1='data' DESTINATION='some value' ;
- But, if we do not use the DESTINATION in the where query, then we will get a NullPointerException in BaseResultIterators.class
(from phoenix-core-4.7.0-HBase-1.1.jar)
- This exception is thrown only when we use the new local index. If we query ignoring the index like this
select /*+ NO_CACHE, SKIP_SCAN ,NO_INDEX */ COL1,COL2,COL3,DESTINATION from Results where COL1='data' DESTINATION='some value' ;
we will not get the exception
Showing some relevant code from the area where we get the exception
...
catch (StaleRegionBoundaryCacheException e2) {
// Catch only to try to recover from region boundary cache being out of date
if (!clearedCache) { // Clear cache once so that we rejigger job based on new boundaries
services.clearTableRegionCache(physicalTableName);
context.getOverallQueryMetrics().cacheRefreshedDueToSplits();
}
// Resubmit just this portion of work again
Scan oldScan = scanPair.getFirst();
byte[] startKey = oldScan.getAttribute(SCAN_ACTUAL_START_ROW);
byte[] endKey = oldScan.getStopRow();
====================Note the isLocalIndex is true ==================
if (isLocalIndex) {
endKey = oldScan.getAttribute(EXPECTED_UPPER_REGION_KEY);
//endKey is null for some reason in this point and the next function
//will fail inside it with NPE
}
List<List<Scan>> newNestedScans = this.getParallelScans(startKey, endKey);
We must use this version of the Jar since we run inside Azure HDInsight and we can not select a newer jar version
Any ideas how to solve this? What does "recover from region boundary cache being out of date" mean? it seems to be related to the problem