I have created a view of hbase in hive with 10 miliion rows and when i am running below query ,distcp is invoked and it throws below error.
INSERT OVERWRITE DIRECTORY '/mapred/INPUT' select hive_cdper1.cid,hive_cdper1.emptyp,hive_cdper1.ethtyp,hive_cdper1.gdtyp,hive_cdseg.mrtl from hive_cdper1 join hive_cdseg on hive_cdper1.cnm=hive_cdseg.cnm;
Output:map 100% reduce 100%
2016-10-17 15:05:34,688 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - Moving data to: /mapred/INPUT from hdfs://mycluster/mapred/INPUT/.hive-staging_hive_2016-10-17_14-57-48_620_6609613978089243090-1/-ext-10000 2016-10-17 15:05:34,693 INFO [main]: common.FileUtils (FileUtils.java:copy(551)) - Source is 483335659 bytes. (MAX: 4000000) 2016-10-17 15:05:34,693 INFO [main]: common.FileUtils (FileUtils.java:copy(552)) - Launch distributed copy (distcp) job. 2016-10-17 15:05:34,695 ERROR [main]: exec.Task (SessionState.java:printError(960)) - Failed with exception Unable to move source hdfs://mycluster/mapred/INPUT/.hive-staging_hive_2016-10-17_14-57-48_620_6609613978089243090-1/-ext-10000 to destination /mapred/INPUT org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://mycluster/mapred/INPUT/.hive-staging_hive_2016-10-17_14-57-48_620_6609613978089243090-1/-ext-10000 to destination /mapred/INPUT at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:105) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:222) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: Cannot get DistCp constructor: org.apache.hadoop.tools.DistCp.() at org.apache.hadoop.hive.shims.Hadoop23Shims.runDistCp(Hadoop23Shims.java:1160) at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:553) at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2622) ... 21 more
What i wonder here is:i am writing to the same cluster ,then why it is invoking distcp instead of normal cp.
Here i am using hive 1.2.1 with hadoop 2.7.2 and my cluster name is mycluster.
Note:i have tried setting hive.exec.copyfile.maxsize=4000000
but didnt work.
Appreciate your suggestions..