I am copying data from one Hive table to another Hive table(External) in Spark SQL code for the data volume 74 million rows (~50 GB). The insert operation is taking more than 40 mins.
hiveContext.sql("insert overwrite table dev_work.WORK_CUSTOMER select * from dev_warehouse.CUSTOMER")
I have tried the other data copy ways such as:
- hdfs -cp for these external tables:
hdfs dfs -cp hdfs:/home/dummy/dev_dwh/CUSTOMER hdfs:/home/dummy/dev_work/WORK_CUSTOMER
- Export Import :
export table dev_warehouse.CUSTOMER to 'hdfs_exports_location/customer'; import external table dev_work.WORK_CUSTOMER from 'hdfs_exports_location/CUSTOMER';
Cluster details:
CDH 5.8 , 19 Node Cluster
Could you please help to tune the performance to find any alternate way to perform fast data copy.
Thanks, Arvind