0
votes

I have created a view of hbase in hive with 10 miliion rows and when i am running below query ,distcp is invoked and it throws below error.

INSERT OVERWRITE DIRECTORY '/mapred/INPUT' select hive_cdper1.cid,hive_cdper1.emptyp,hive_cdper1.ethtyp,hive_cdper1.gdtyp,hive_cdseg.mrtl from hive_cdper1 join hive_cdseg on hive_cdper1.cnm=hive_cdseg.cnm;

Output:map 100% reduce 100%

2016-10-17 15:05:34,688 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - Moving data to: /mapred/INPUT from hdfs://mycluster/mapred/INPUT/.hive-staging_hive_2016-10-17_14-57-48_620_6609613978089243090-1/-ext-10000 2016-10-17 15:05:34,693 INFO [main]: common.FileUtils (FileUtils.java:copy(551)) - Source is 483335659 bytes. (MAX: 4000000) 2016-10-17 15:05:34,693 INFO [main]: common.FileUtils (FileUtils.java:copy(552)) - Launch distributed copy (distcp) job. 2016-10-17 15:05:34,695 ERROR [main]: exec.Task (SessionState.java:printError(960)) - Failed with exception Unable to move source hdfs://mycluster/mapred/INPUT/.hive-staging_hive_2016-10-17_14-57-48_620_6609613978089243090-1/-ext-10000 to destination /mapred/INPUT org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://mycluster/mapred/INPUT/.hive-staging_hive_2016-10-17_14-57-48_620_6609613978089243090-1/-ext-10000 to destination /mapred/INPUT at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:105) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:222) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: Cannot get DistCp constructor: org.apache.hadoop.tools.DistCp.() at org.apache.hadoop.hive.shims.Hadoop23Shims.runDistCp(Hadoop23Shims.java:1160) at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:553) at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2622) ... 21 more

What i wonder here is:i am writing to the same cluster ,then why it is invoking distcp instead of normal cp.

Here i am using hive 1.2.1 with hadoop 2.7.2 and my cluster name is mycluster.

Note:i have tried setting hive.exec.copyfile.maxsize=4000000 but didnt work.

Appreciate your suggestions..

2
Can you try same query with "INSERT OVERWRITE LOCAL DIRECTORY" directory option. I have other commands to extract data from hive table in csv format. - Vijay_Shinde
yes in local works fine but not in hdfs. is it like not possible in hdfs or any problems with the set up. - Ranjan Swain

2 Answers

0
votes

1) check permission of your destination path /mapred/INPUT

2) If write permission is not there for other user, then hadoop fs -chmod a+w /mapred/INPUT

0
votes

Setting below properties in hive-site.xml solved my issue.

  <property>
<name>hive.exec.copyfile.maxsize</name>
<value>3355443200</value>
<description>Maximum file size (in Mb) that Hive uses to do single HDFS copies between directories.Distributed copies (distcp) will be used instead for bigger files so that copies can be done faster.</description>
</property>