I am finding it difficult to load parquet files into hive tables. I am working on Amazon EMR cluster and spark for Data processing. But i need to read the output parquet files to validate my transformations. i have the parquet files with following schema:
root
|-- ATTR_YEAR: long (nullable = true)
|-- afil: struct (nullable = true)
| |-- clm: struct (nullable = true)
| | |-- amb: struct (nullable = true)
| | | |-- L: string (nullable = true)
| | | |-- cdTransRsn: string (nullable = true)
| | | |-- dist: struct (nullable = true)
| | | | |-- T: string (nullable = true)
| | | | |-- content: double (nullable = true)
| | | |-- dscStrchPurp: string (nullable = true)
| | |-- amt: struct (nullable = true)
| | | |-- L: string (nullable = true)
| | | |-- T: string (nullable = true)
| | | |-- content: double (nullable = true)
| | |-- amtTotChrg: double (nullable = true)
| | |-- cdAccState: string (nullable = true)
| | |-- cdCause: string (nullable = true)
how can i create hive external table using this type of schema and load the parquet files into that hive table for analysis?