I have a dataframe with 100 columns and col names like col1, col2, col3.... I want to apply certain transformation on the values of columns based on condition matches. I can store the column names in a array of string. And pass the value each element of the array in withColumn and based on When condition i can transform the values of the column vertically. But the question is, as Dataframe is immutable, so each updated version is need to store in a new variable and also new dataframe need to pass in withColumn to transform for next iteration. Is there any way to create array of dataframe so that new dataframe can be stored as a element of array and it can iterate based on the value of iterator. Or is there any other way to handle the same.
var arr_df : Array[DataFrame] = new Array[DataFrame](60)
--> This throws error "not found type DataFrame"
val df(0) = df1.union(df2)
for(i <- 1 to 99){
val df(i) = df(i-1).withColumn(col(i), when(col(i)> 0, col(i) +
1).otherwise(col(i)))
Here col(i) is an array of strings that stores the name of the columns of the original datframe .
As a example :
scala> val original_df = Seq((1,2,3,4),(2,3,4,5),(3,4,5,6),(4,5,6,7),(5,6,7,8),(6,7,8,9)).toDF("col1","col2","col3","col4")
original_df: org.apache.spark.sql.DataFrame = [col1: int, col2: int ... 2 more fields]
scala> original_df.show()
+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
| 1| 2| 3| 4|
| 2| 3| 4| 5|
| 3| 4| 5| 6|
| 4| 5| 6| 7|
| 5| 6| 7| 8|
| 6| 7| 8| 9|
+----+----+----+----+
I want to iterate 3 columns : col1, col2, col3 if the value of that column is greater than 3, then it will be updated by +1