Why do I have to convert an RDD to DF in order to write it as parquet, avro or other types? I know writing RDD as these formats is not supported. I was actually trying to write a parquet file with first line containing only the header date and other lines containing the detail records. A sample file layout
2019-04-06
101,peter,20000
102,robin,25000
I want to create a parquet with the above contents. I already have a csv file sample.csv with above contents. The csv file when read as dataframe contains only the first field as the first row has only one column.
rdd = sc.textFile('hdfs://somepath/sample.csv')
df = rdd.toDF()
df.show()
o/p:
2019-04-06
101
102
Could someone please help me with converting the entire contents of rdd into dataframe. Even when i try reading the file directly as a df instead of converting from rdd same thing happens.