I have a job to transfer hive tables between hadoop cluster. What I did was download the orc file from the source hadoop cluster and then upload the orc file into target hdfs cluster using the the following command.
hadoop fs -get
hadoop fs -put
The orc file in target hadoop clustr can be read by the following way in spark application:
df = sqlContext.sql('select * from orc.`path_to_where_orc_file_is`')
However, there is no corresponding table within the hive in target hadoop cluster.
Is there a way to create table in hive from orc file in hdfs without specifying ddl or schema? Since orc file themselves contain schema information.
The reason I am asking this question is because the schema of the original hive table is quite nested and has many field.
Currently the only solution I can think of is read those orc file in spark, and write them out with the saveAsTable option as following:
dfTable.write.format("orc").mode(SaveMode.Overwrite).saveAsTable("db1.test1")