below is my spark/SCALA program to read my source file. (CSV file)
val csv = spark.read
.format("com.databricks.spark.csv")
.option("header", "true") //reading the headers
// .option("mode", "DROPMALFORMED")
.option("inferSchema", "true")
.load("C:\\TestFiles\\SAP_ENT_INVBAL.csv"); //.csv("csv/file/path") //spark 2.0 api
csv.show()
csv.printSchema()
csv.show()
}
The output contains the file header, but for my processing i need different naming convention rather than file header.
I have tried couple of options and works well.
- Renaming the dataframe columns
- Use add(StructField function
But i want to make my code to be generic. Just pass schema file while reading the file and create the dataframe with columns according to schema file.
Kindly help to solve this.