I'm quite new both Spark and Scale and could really need a hint to solve my problem. So I have two DataFrames A (columns id and name) and B (columns id and text) would like to join them, group by id and combine all rows of text into a single String:
A
+--------+--------+
| id| name|
+--------+--------+
| 0| A|
| 1| B|
+--------+--------+
B
+--------+ -------+
| id| text|
+--------+--------+
| 0| one|
| 0| two|
| 1| three|
| 1| four|
+--------+--------+
desired result:
+--------+--------+----------+
| id| name| texts|
+--------+--------+----------+
| 0| A| one two|
| 1| B|three four|
+--------+--------+----------+
So far I'm trying the following:
var C = A.join(B, "id")
var D = C.groupBy("id", "name").agg(collect_list("text") as "texts")
This works quite well besides that my texts column is an Array of Strings instead of a String. I would appreciate some help very much.