I would like to transform some columns in my dataframe based on configuration represented by Scala maps.
I have 2 case:
- Receiving a map
Map[String, Seq[String]]
and columns col1, col2, to transform col3 if there is an entity in a map with key = col1, and col2 is in this entity value list. - Receiving a map
Map[String, (Long, Long)
and col1, col2, to transform col3 if there is an entity in a map with key = col1 and col2 is in a range describe by the tuple of Longs as (start, end).
examples:
case 1 having this table, and a map Map(u1-> Seq(w1,w11), u2 -> Seq(w2,w22))
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| u1 | w1 | v1 |
+------+------+------+
| u2 | w2 | v2 |
+------+------+------+
| u3 | w3 | v3 |
+------+------+------+
I would like to add "x-" prefix to col3, only if it matchs the term
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| u1 | w1 | x-v1 |
+------+------+------+
| u2 | w2 | x-v2 |
+------+------+------+
| u3 | w3 | v3 |
+------+------+------+
case 2: This table and map Map("u1" -> (1,5), u2 -> (2, 4))
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| u1 | 2 | v1 |
+------+------+------+
| u1 | 6 | v11 |
+------+------+------+
| u2 | 3 | v3 |
+------+------+------+
| u3 | 4 | v3 |
+------+------+------+
expected output should be:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| u1 | 2 | x-v1 |
+------+------+------+
| u1 | 6 | v11 |
+------+------+------+
| u2 | 3 | x-v3 |
+------+------+------+
| u3 | 4 | v3 |
+------+------+------+
This can easily be done by UDFs, but for performance concerned, I would like not to use them.
Is there a way to achieve it without it in Spark 2.4.2?
Thanks
Map("u1" -> (1,5), u2 -> (2, 4))
toMap("u1" -> Seq(1,5), u2 -> Seq(2, 4))
? – Srinivas