I have one spark dataset Dataset<T>
loaded from Cassandra Table, and I want to apply list of operations (chain or pipeline) on this dataset.
For example:
Dataset<T> dataset= sparkSession.createDataset(javaFunctions(spark.sparkContext())
.cassandraTable(...));
Dataset<Row> result = dataset.apply(func1()).apply(func2()).apply(func3());
func1() will replace null values with most frequent ones.
func2() will add new columns with new values.
func3() ....etc.
What is the best way to apply this pipeline of functions?