Issue inserting data into hive table using spark

Question

Currently I am working on Spark version 2.1.0, as a part of my data ingestion job I have to use insertinto method to ingest data into hive tables. But there is bug with Spark 2.1 version, insertinto method will not maintain column sequence while inserting data into hive table.

I have already tried to use saveAsTable method with append mode but it will not worked as I am creating tables manually first with correct data type before data ingestion.

I have tried to create spark data frame from existing hive table and tried to get columns sequence from it, and pass this list result to ensure column sequence but every time creating data frame on top of hive table to get column sequence. Will it be memory overhead for every time loading hive table to create data frame?

Does anybody have idea, how to maintain column sequence during data ingestion into hive table with better approach?

Steven Steven · Accepted Answer · 2019-02-26T16:01:52

You could probably try to first acquire the columns of the hive table and then apply them to your spark dataframe :

target_table = sqlContext.table("my_target_table")
my_df.select(*target_table.columns).saveAsTable("my_target_table")

Issue inserting data into hive table using spark

1 Answers