Spark SQL - Hivecontext - Datacopy from one table to another table in Hive

Question

I am copying data from one Hive table to another Hive table(External) in Spark SQL code for the data volume 74 million rows (~50 GB). The insert operation is taking more than 40 mins.

hiveContext.sql("insert overwrite table dev_work.WORK_CUSTOMER select * from  dev_warehouse.CUSTOMER")

I have tried the other data copy ways such as:

hdfs -cp for these external tables:

hdfs dfs -cp hdfs:/home/dummy/dev_dwh/CUSTOMER hdfs:/home/dummy/dev_work/WORK_CUSTOMER

Export Import :

export table dev_warehouse.CUSTOMER to 'hdfs_exports_location/customer';
import external table dev_work.WORK_CUSTOMER from 'hdfs_exports_location/CUSTOMER';

Cluster details:

CDH 5.8 , 19 Node Cluster

Could you please help to tune the performance to find any alternate way to perform fast data copy.

Thanks, Arvind

Zhang Tong Zhang Tong · Accepted Answer · 2017-08-23T01:02:57

Trying Hadoop DistCp which is a tool used for large inter/intra-cluster copying

http://hadoop.apache.org/docs/r2.7.3/hadoop-distcp/DistCp.html

Spark SQL - Hivecontext - Datacopy from one table to another table in Hive

1 Answers