1
votes

there are HBase on five servers with one Table that contains one column Family and I should do some map tasks on it per each key and save the result. the main question is:

to keep data locality which one is better: create new Column Family on the existence Table or create new Table?

and the Next question:

HBase Documentation suggests keeping lower than three Column Family, and as I told I have more than ten map tasks and would to keep each result in new Column Family.what shall I do? because each map tasks are different from the other one. the locality preserving and search cost are important.

1
can you give a link to a documentation that tells about 3 column families per table?AdamSkywalker
@AdamSkywalker, please consider this link hbase.apache.org/1.2/book.html#number.of.cfsHossein Vatani
thanks for the link. Notice that this recommendation is mostly about writing to different CF. For reading it is not really important.AdamSkywalker

1 Answers

2
votes

which one is better: create new Column Family on the existence Table or create new Table

I would recommend to care more about schema and simplicity of table design, rather than trying to hack HBase internals to get the best performance. If information from these 2 column families is related and you need to access both CFs in map-reduce scans - keep them in same table. If information is 100% independent and you will never need to scan them simultaneously - keep them in different tables. Again, it's a schema design question, don't try to perform premature optimisations.

Second question - I did not understand what you're asing, sorry.