2
votes

I have 2 node hbase cluster running on amazon-ec2(hadoop 1.0.1, hive-0.11.0, hbase-0.94.11,zookeeper-3.4.3) and create on EMR node with ami-2.4.1.

So on EMR instance, I have one external table which is pointing to some location on S3. Also, I have created one more hbase-hive table (modelvarlarge, modelval). Now, I was trying to insert the data from logdata to modelvar.

But, reducer phase gets stuck at 99% and result in the following error: FYI, thorugh zkcli i am able to connect from EMR to Ec2 zookeeper.

Externl Table :

create external table logdata(date_local string, time_local string,s_computername string,
    c_ip string,s_ip string,s_port string,s_sitename string, referer string, localfile string, 
    TimeTakenMS string, status string, w3status string, sc_substatus string, uri string, qs string, 
    sc_bytes string, cs_bytes string, cs_username string, cs_User_Agent string, s_proxy string, c_protocol string, 
    cs_version string, cs_method string, cs_Cookie string, cs_Host string, w3wpbytes string, RequestsPerSecond string, 
    CPU_Utilization string, BeginRequest_UTC string, EndRequest_UTC string, time string, logdate string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' location 's3://xxxxxxxxx';

Hbase-Hive table :

    CREATE TABLE modelvar(cookie string, pageviews string, visit string) 
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = "m:pageviews,m:visit")
    TBLPROPERTIES ("hbase.table.name"="modelvarlarge");

Query: insert into table modelvar select x.cookie, hits, visit from (select cs_Cookie as Cookie, count(*) as hits from logdata where (uri like '%.aspx%' or uri like '%.html%') group by cs_Cookie)x join (select cs_Cookie as Cookie, count(distinct cs_Cookie) as visit from logdata group by cs_Cookie)y on x.cookie=y.cookie order by hits desc;

Error:

java\.lang\.RuntimeException: org\.apache\.hadoop\.hive\.ql\.metadata\.HiveException: Hive Runtime Error while processing row (tag\=0) {\"key\":{\"reducesinkkey0\":24655},\"value\":{\"_col0\":\"-\",\"_col1\":24655,\"_col2\":17},\"alias\":0}
at org\.apache\.hadoop\.hive\.ql\.exec\.ExecReducer\.reduce(ExecReducer\.java:278)
at org\.apache\.hadoop\.mapred\.ReduceTask\.runOldReducer(ReduceTask\.java:528)
at org\.apache\.hadoop\.mapred\.ReduceTask\.run(ReduceTask\.java:429)
at org\.apache\.hadoop\.mapred\.Child$4\.run(Child\.java:255)
at java\.security\.AccessController\.doPrivileged(Native Method)
at javax\.security\.auth\.Subject\.doAs(Subject\.java:415)
at org\.apache\.hadoop\.security\.UserGroupInformation\.doAs(UserGroupInformation\.java:1132)
at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:249)
Caused by: org\.apache\.hadoop\.hive\.ql\.metadata\.HiveException: Hive Runtime Error while processing row (tag\=0) {\"key\":{\"reducesinkkey0\":24655},\"value\":{\"_col0\":\"-\",\"_col1\":24655,\"_col2\":17},\"alias\":0}
at org\.apache\.hadoop\.hive\.ql\.exec\.ExecReducer\.reduce(ExecReducer\.java:266)
\.\.\. 7 more
Caused by: org\.apache\.hadoop\.hive\.ql\.metadata\.HiveException: java\.io\.IOException: org\.apache\.hadoop\.hbase\.client\.HConnectionManager$HConnectionImplementation@10f00d3 closed
at org\.apache\.hadoop\.hive\.ql\.io\.HiveFileFormatUtils\.getHiveRecordWriter(HiveFileFormatUtils\.java:241)
at org\.apache\.hadoop\.hive\.ql\.exec\.FileSinkOperator\.createBucketFiles(FileSinkOperator\.java:539)
at org\.apache\.hadoop\.hive\.ql\.exec\.FileSinkOperator\.processOp(FileSinkOperator\.java:621)
at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:502)
at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:832)
at org\.apache\.hadoop\.hive\.ql\.exec\.SelectOperator\.processOp(SelectOperator\.java:84)
at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:502)
at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:832)
at org\.apache\.hadoop\.hive\.ql\.exec\.ExtractOperator\.processOp(ExtractOperator\.java:45)
at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:502)
at org\.apache\.hadoop\.hive\.ql\.exec\.ExecReducer\.reduce(ExecReducer\.java:257)
\.\.\. 7 more
Caused by: java\.io\.IOException: org\.apache\.hadoop\.hbase\.client\.HConnectionManager$HConnectionImplementation@10f00d3 closed
at org\.apache\.hadoop\.hbase\.client\.HConnectionManager$HConnectionImplementation\.locateRegion(HConnectionManager\.java:794)
at org\.apache\.hadoop\.hbase\.client\.HConnectionManager$HConnectionImplementation\.locateRegion(HConnectionManager\.java:782)
at org\.apache\.hadoop\.hbase\.client\.HTable\.finishSetup(HTable\.java:249)
at org\.apache\.hadoop\.hbase\.client\.HTable\.(HTable\.java:213)
at org\.apache\.hadoop\.hbase\.client\.HTable\.(HTable\.java:171)
at org\.apache\.hadoop\.hive\.hbase\.HiveHBaseTableOutputFormat\.getHiveRecordWriter(HiveHBaseTableOutputFormat\.java:82)
at org\.apache\.hadoop\.hive\.ql\.io\.HiveFileFormatUtils\.getRecordWriter(HiveFileFormatUtils\.java:250)
at org\.apache\.hadoop\.hive\.ql\.io\.HiveFileFormatUtils\.getHiveRecordWriter(HiveFileFormatUtils\.java:238)
\.\.\. 17 more
1

1 Answers

0
votes

You need to define the host and ip mapping in the whole EMR cluster. Let's say you are using 3 node hbase cluster on Ec2 and their ips are

 ip1, ip2, ip3

Which we have given alias in the host file of ec2 hbase cluster like this :

ip1 master
ip2 rgserver1
ip3 rgserver3

So, in the host file of each and every EMR node you also need to define the mapping similar to above. Otherwise it won't be able to write the data to hbase cluster.