I have and RDD[String], each String is a hive text format row data, and the hive table is in hive database so I can get the schema, is there way to let spark parse RDD[String] to a DataFrame with the schema so I don't need to it manually.
1 Answers
0
votes
If in your RDD[String], each string represent a particular structure like (id,name,salary). You can create the case class in scala and convert your RDD[String] to RDD[case class] and then use toDF() function to convert RDD to DataFrame.
If your file is delimited then you can use csv package to create the DataFrame on delimited file if you are using spark 2.x or later. Or if you are using spark 1.6.x or earlier you can use external spark-csv package for the same.
Hope it helps.
Regards, Neeraj