I am using Airflow hiveserver2 hook to get results from Hive table and load into CSV. The hook to_csv function has a parameter 'output_headers'. If set to true, it gets column names in the form of tablename.columnname along with data and writes to a CSV file. In the CSV header I just need the column names and need to get rid of the tablename from tablename.columnname. Can I override the parameter somehow to just get column names? Is there any other way to just retrieve column names using HiveServer2Hook?
I have connected to Hive using HiveServer2Hook. I have also executed the hooks to_csv function. I just need to change the format of the column names returned using the function. Here is the link to the hook. You can find the to_csv, get_records and get_results function under HiveServer2Hook.
https://airflow.apache.org/_modules/airflow/hooks/hive_hooks.html
I also tried running 'describe tablename' and 'show columns from tablename' for HQL but the hive hook's get_records and get_results function breaks on header issue as the result returned by 'describe' and 'show columns' is not in the expected format.
tried the following:
1) describe tablename;
2) show columns from tablename;
The airflow hook has functions get_records and get_results. Both break on following line when I use above HQL statements.
header = next(results_iter)
Is there any other way to get column names, write to CSV and pull data using HiveServer2Hook and Python?