Thanks for taking interest in my question. Before I begin, I'd like to let you know that I'm very new to Hadoop & HBase. So far, I find Hadoop very interesting and would like to contribute more in the future.
I'm primarily interested in improving performance of HBase. To do so, I had modified Writer
methods in HBase's /io/hfile/Hfile.java
in a way that it does high-speed buffered data assembly and then directly write to Hadoop so that it can later be loaded by HBase.
Now, I'm trying to come up with a way to compress key-value pairs so that bandwidth could be saved. I've done a lot of research to figure out how; and then realized that HBase has built-in compression libraries.
I'm currently looking at SequenceFile (1); setCompressMapOutput (2) (deprecated); and Class Compression (3). I also found a tutorial on Apache's MapReduce.
Could someone explain what "SequenceFile" is, and how I can implement those compression libraries and algorithms? These different classes and documents are so confusing to me.
I'd sincerely appreciate your help.
--
Hyperlinks:
(1): hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html
(2): hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/JobConf.html#setCompressMapOutput%28boolean%29
(3): www.apache.org/dist/hbase/docs/apidocs/org/apache/hadoop/hbase/io/hfile/Compression.html