3
votes

I have a log file, and the first column would be my partition in Hive table.

    logSchemaRDD.registerTempTable("logs")

    hiveContext.sql("insert overwrite table logs_parquet PARTITION(create_date=select ? from logs) select * from logs")

How do I construct the query to select the first column (marked as ? here) and ensure that the one I selected in partition matches the 2nd select (*)?

1

1 Answers

2
votes

You need to explicitly enumerate the columns in both the source and target list: in this case select * will not suffice.

insert overwrite table logs_parquet PARTITION(create_date) (col2, col3..) 
select col2,col3, .. col1 from logs

Yes it is more work to write the query - but partitioning queries do require the explicit mapping of the columns with the partitioning columns last.