0
votes

I am unable to load data from pig to hbase in cloudera cdh3. It is showing me the data when I dump the data, but when i try to put data to hbase from pig using store command, it is able to find the table and launch mapreduce task. But ultimately it is showing the following error message ->

failed to read data from "test/NYSE_daily_prices_Q.csv"

and at the very end

2015-02-16 11:29:44,266 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2015-02-16 11:29:44,268 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. Row key is invalid Details at logfile: /home/cloudera/pig_1424114902913.log

here is the code i used. Can someone please help me how to resolve the issue.

data = LOAD '/test/NYSE_daily_prices_Q.csv' USING PigStorage(',') AS (exchange:chararray,symbol:chararray,date:chararray,stock_price_open:float, stock_price_high:float,stock_price_low:float,stock_price_close:float,stock_volume:int,stock_price_adj_close:float);

dump data;

STORE data INTO 'hbase://NYStockDetails' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:exchange info:symbol info:date info:stock_price_open info:stock_price_high info:stock_price_low info:stock_price_close info:stock_volume info:stock_price_adj_close');

1

1 Answers

1
votes

When you execute the command locally (which is what I assume you mean when you say you dump the data), your command LOAD '/test/NYSE_daily_prices_Q.csv' is able to point to a specific file on the local filesystem.

When you execute the same command with the HBase export appended, a map-only MapReduce job kicks off. The mapper will be running on a random node on your cluster, and so won't necessarily have access to NYSE_daily_prices_Q.csv which I presume is stored locally on only one node. Therefore you get an error when it tries to load into HBase.

The solution is to to add this file to the HDFS and then load it from there, i.e. LOAD 'hdfs://my-hdfs-location/test/NYSE_daily_prices_Q.csv'.