0
votes

From this article, Phoenix is good for fast HBase lookups. Its secondary indexing feature supports many SQL constructs and make lookup via non-primary key fields more efficient than full table scans. It simplifies the creation and management of typed row-centric data by providing composite row keys and by enforcing constraints on data when written using the Phoenix interfaces. Another interesting fact about Phoenix is the fact that we can have Sequences which we have in relational databases like Oracle, this helps us to have sequences in a distributed environment.

However, in general the join options on NoSQL DB are expensive. A join requires scanning each region and broadcast the results to other regions. One major benefit of Phoenix is to use SQL for hbase, and join is an important part of using SQL. Therefore, what is the point of having the Phoenix since join is expensive on NoSQL DB.

Can I say that a good use case of Phoenix does not have a lot of joins?

1

1 Answers

1
votes

The following is my take , it is subjective. I am biased to phoenix as i use it a lot!

a)The SQL Semantics is a big +

b)Phoenix can also parallelize queries using internal mechanism (Phoenix guide posts)

c)Phoenix provides a nice way of pre-split of tables (1 byte salt) that can help you avoid hotspotting

d)Deletes are tough with part keys in HBase , you can do that for sure in Phoenix

e)Aggregations are quite handy in phoenix (group by)

f)Some connectors from Phoenix are a lot more nice (e.g. spark extension functions , phoenixTableAsDataFrame for example)

g) When writing complex queries , i use the explain plan quite a lot to understand the scans

h) HINTS (I love SKIP_SCANS , especially during sampling) , Hints on broadcast joins are useful too

i) How about goodies like CONVERT_TZ , DATE functions on SQL

j) Views are neat , nice base tables with projected views comes handy (especially share environments)