I would like to not use null value for field of a class used in dataset. I try to use scala Option
and java Optional
but it failed:
@AllArgsConstructor // lombok
@NoArgsConstructor // mutable type is required in java :(
@Data // see https://stackoverflow.com/q/59609933/1206998
public static class TestClass {
String id;
Option<Integer> optionalInt;
}
@Test
public void testDatasetWithOptionField(){
Dataset<TestClass> ds = spark.createDataset(Arrays.asList(
new TestClass("item 1", Option.apply(1)),
new TestClass("item .", Option.empty())
), Encoders.bean(TestClass.class));
ds.collectAsList().forEach(x -> System.out.println("Found " + x));
}
Fails, at runtime, with message File 'generated.java', Line 77, Column 47: Cannot instantiate abstract "scala.Option"
Question: Is there a way to encode optional fields without null in a dataset, using java?
Subsidiary question: btw, I didn't use much dataset in scala either, can you validate that it is actually possible in scala to encode a case class containing Option fields?
Note: This is used in an intermediate dataset, i.e something that isn't read nor write (but for spark internal serialization)