2
votes

I'm trying to use Apache Phoenix to run SQL queries on HBase tables. Based on the official documetation, a schema need to be created for existing tables with the SQL query:

CREATE TABLE TABLE_NAME (....)

I tried to avoid this by directly connecting to an existing table (created with HBase API) through the phoenix API but I was getting exceptions. The thing is when Phoenix executes this query, it creates a lot of things on the table. For instance, in the tables section of the hbase dashboard, I can see the following meta-data added by phoenix to my table:

'QUOTES', {METHOD => 'table_att', coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|1|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|1|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|1|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|1|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|1073741823|index.builder=org.apache.phoenix.index.PhoenixIndexBuilder,org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec'}, {NAME => '0', DATA_BLOCK_ENCODING => 'FAST_DIFF', KEEP_DELETED_CELLS => 'true'}

It sounds like pheonix is changing the meta information of the table (it creates some coprocessors and index builders), Is this gone create problems for production (interfer with code that uses HBase API)? if so how to avoid it?

1

1 Answers

4
votes

Yes, Apache Phoenix adds coprocessors to the metadata for the underlying HBase tables when you do a CREATE TABLE or CREATE VIEW as documented here. These will not interfere with code that uses the HBase APIs, as any processing done by the coprocessors is only triggered when Phoenix-specific attributes are set by the client making the API call.

For a Phoenix VIEW, only these metadata changes are made. For a Phoenix TABLE, in addition to these metadata changes, an empty KeyValue is added to every row of the table. This is done to improve performance as well as to prevent the row from "disappearing" if all columns are set to null. More details here.