how to get the column names and their datatypes of parquet file using pyspark?

Question

i have a parquet file on my hadoop cluster ,i want to capture the column names and their datatypes and write it on a textfile.how to get the column names and their datatypes of parquet file using pyspark.

zero323 zero323 · Accepted Answer · 2016-01-09T16:39:36

You can simply read the file and use schema to access individual fields:

sqlContext.read.parquet(path_to_parquet_file).schema.fields

how to get the column names and their datatypes of parquet file using pyspark?

2 Answers