6
votes

I have a job to transfer hive tables between hadoop cluster. What I did was download the orc file from the source hadoop cluster and then upload the orc file into target hdfs cluster using the the following command.

hadoop fs -get 
hadoop fs -put

The orc file in target hadoop clustr can be read by the following way in spark application:

df = sqlContext.sql('select * from orc.`path_to_where_orc_file_is`') 

However, there is no corresponding table within the hive in target hadoop cluster.

Is there a way to create table in hive from orc file in hdfs without specifying ddl or schema? Since orc file themselves contain schema information.

The reason I am asking this question is because the schema of the original hive table is quite nested and has many field.

Currently the only solution I can think of is read those orc file in spark, and write them out with the saveAsTable option as following:

dfTable.write.format("orc").mode(SaveMode.Overwrite).saveAsTable("db1.test1")
1

1 Answers

0
votes
val table= spark.read.orc("hdfspath")
table.printSchema

Table is a dataframe and it has schema in it.