0
votes

all:

Recently,I wrote a coprocessor in Hbase(0.94.17), A Class extends BaseEndpointCoprocessor, a rowcount method to count one table's rows.

And I got a problem.

if I did not set a filter in scan,my code works fine for two tables. One table has 1,000,000 rows,the other has 160,000,000 rows. it took about 2 minutes to count the bigger table.

however ,If I set a filter in scan, it only work on small table. it will throw a exception on the bigger table. org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@2c88652b, java.io.IOException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0

trust me,I check my code over and over again.

so, to count my table with filter, I have to write the following stupid code, first, I did not set a filter in scan,and then ,after I got one row record, I wrote a method to filter it.

and it work on both tables.

But I do not know why.

I try to read the scanner source code in HRegion.java,however, I did not get it.

So,if you know the answer,please help me. Thank you.

@Override
    public long rowCount(Configuration conf) throws IOException {
        // TODO Auto-generated method stub
        Scan scan = new Scan();
        parseConfiguration(conf);
        Filter filter = null;
        if (this.mFilterString != null && !mFilterString.equals("")) {
            ParseFilter parse = new ParseFilter();
            filter = parse.parseFilterString(mFilterString);
            // scan.setFilter(filter);
        }

        scan.setCaching(this.mScanCaching);
        InternalScanner scanner = ((RegionCoprocessorEnvironment) getEnvironment()).getRegion().getScanner(scan);
        long sum = 0;

        try {
            List<KeyValue> curVals = new ArrayList<KeyValue>();
            boolean hasMore = false;
            do {
                curVals.clear();
                hasMore = scanner.next(curVals);
                if (filter != null) {
                    filter.reset();
                    if (HbaseUtil.filterOneResult(curVals, filter)) {
                        continue;
                    }
                }
                sum++;
            } while (hasMore);

        } finally {
            scanner.close();
        }
        return sum;
    }

The following is my hbase util code:

public static boolean filterOneResult(List<KeyValue> kvList, Filter filter) {
        if (kvList.size() == 0)
            return true;
        KeyValue kv = kvList.get(0);
        if (filter.filterRowKey(kv.getBuffer(), kv.getRowOffset(), kv.getRowLength())) {
            return true;
        }

        for (KeyValue kv2 : kvList) {
            if (filter.filterKeyValue(kv2) == Filter.ReturnCode.NEXT_ROW) {
                return true;
            }
        }
        filter.filterRow(kvList);
        if (filter.filterRow())
            return true;
        else
            return false;
    }
1
And my filter is a SingleColumnVaueFilter. very short. SingleColumnValueFilter('F','s',=,'binary:0',true,true) - dape

1 Answers

0
votes

Ok,It was my mistake. After I use jdb to debug my code, I got the following exception,

 "org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:635)
    at java.util.ArrayList.get(ArrayList.java:411)

It is obvious ,my result list is empty.

hasMore = scanner.next(curVals);

it means, if I use a Filter in scan,this curVals list might be empty, but hasMore is true.

but I thought,if a record was filtered, it should jump to the next row,and this list should never be empty. I was wrong.

And my client did not print any remote error message on my console, it just catch this remote Exception, and retry. after retry 10 times, it print an another exception,which was meaningless.