I am new to PySpark and trying to read HDFS files (which has hive tables created on top of it) and create PySpark dataframes. Reading Hive tables through PySpark is time consuming. Are there any ways I can get hive column names (to use as schema in dataframe) dynamically?
I am looking to pass file location, table name and database name as inputs to aa program/function to get the schema/column name from hive metadata (probably metadata xml) and return as dataframe.
Please advise