I am using Apache Phoenix for simplifying my data retrieval/update operations on Hbase. But I would like to know from the performance point of view which one would be better? Phoenix or Custom wrapper using Hbase Native API? Or do we have any other approach which does not impact the performance?
2 Answers
In Perfect world Native api will work faster, but you will need work on it all time because develop good api for Hbase it's a separate project. Big project.
And you will need excellent understanding of map-reduce and hbase internal process. But Phoenix already do all this for you. For example Secondary Indexing to create and automatically maintain global indexes over your primary table. Queries automatically use an index when more efficient, turning your full table scans into point and range scans. Multiple columns may be indexed in ascending or descending sort order. Additional primary table columns may be included in the index to form a covered index. Available in two flavors: Server-side index maintenance for mutable data. Client-side index maintenance optimized for write-once, append-only use cases.
Also Phoenix make skip scan map-reduce for you. And much more... look here http://phoenix-hbase.blogspot.com/ or http://phoenix.apache.org/performance.html#
Team that work under phoenix have spent a lot of time to optimise all this operation, and if you want write with native api. You must be sure that you can do it better.
AS another solution you can use Hive or SparkSQL. But Hive have less performance and Spark is separate cluster and also difficult technology.
Also is very good a technology is SparkOnHbase with high performance http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ This one is faster, but much more complex. And don't have some good feature like index and Hbase native function. So you will need write it.
Phoenix will be a good fit as it converts an SQL like query into native HBase calls using better understanding of the inner workings of Hbase. It will implement co-processors for you, maintain indexes for you which if you plan to implement via Hbase API will be a cumbersome process. So phoenix makes the life of a Hbase query easier. Since it has been created by salesforce you can vouch for it. Also, it has good community support .