I am trying to read a text file(.gz) with Spark 2.0/SparkSession.
The field seprator is ';'. First few fields are being loaded properly, but the last few fields where the data doesn't exists are no being read by spark.
For example, until ...h;7 is being read by spark,but not after that...Null fileds are being handled if they are before h;7;.
Can i know why is spark ignoring the last fields???
File Format:
1;2;6;;;;;h;7;;;;;;;;;
Code:
JavaRDD<mySchema> peopleRDD = spark.read()
.textFile("file:///app/home/emm/zipfiles/myzips/")
.javaRDD()
.map(new Function<String, mySchema>()
{
@Override
public mySchema call(String line) throws Exception
{
String[] parts = line.split(";");
mySchema mySchema = new mySchema();
mySchema.setCFIELD1 (parts[0]);
mySchema.setCFIELD2 (parts[1]);
mySchema.setCFIELD3 (parts[2]);
mySchema.setCFIELD4 (parts[3]);
mySchema.setCFIELD5 (parts[4]);
................................
................................
return mySchema;
}
});