I am using HBaseStorage with -caching option in pig script as follows
HBaseStorage('countDetails:ansCount countDetails:divCount countDetails:unansCount countDetails:engCount countDetails:ineffCount countDetails:totalCount', '-caching 1000');
I can see this was reflecting in my job.xml but I can see there is no time difference in it I am processing 10 million records and store data around 160mb in to HBase. When I store the result in hdfs its taking 3 mins to process the same job takes 30mins to store into HBase.
I even tried by setting
SET hbase.client.scanner.caching 1000;
Please let me know how can I reduce the time. Is there any alternative for HBaseStorage? http://apmblog.compuware.com/2013/02/19/speeding-up-a-pighbase-mapreduce-job-by-a-factor-of-15/
the above blog says that I have to set hbase.client.scanner.caching in bootstrap scrip I don't know how to do that will it be enough If I set it in Hbase-conf. Please help me out of this
hbase.client.scanner.caching
property to thehbase-site.xml
. Then restart the cluster (Hadoop and HBase) otherwise have a look at Amazon's docs about bootstrap actions: docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/… – Lorand Bendig