0
votes

I am new to Spark Sql and I have a column of type array with data like below :

[{"X":"A11"},{"X":"A12"},{"X":"A13"}]

The output I am looking for is a string field as

A11, A12, A13

I cannot explode the array as I need the data in one row.

Since the maximum length of the array in my case is 6, I got it to work using below case statement.

case 
    when size(arr)=1 then array_join(map_values(map_concat(arr[0])),',') 
    when size(arr)=2 then array_join(map_values(map_concat(arr[0],arr[1])),',') 
    when size(arr)=3 then array_join(map_values(map_concat(arr[0],arr[1],arr[2])),',') 
    when size(arr)=4 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3])),',') 
    when size(arr)=5 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3],arr[4])),',') 
    when size(arr)=6 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3],arr[4],arr[5])),',') 
else 
    null
end

Is there a better way to do this?

1

1 Answers

0
votes

Assuming that the source and result columns are col and values respectively, it can be implemented as follows:

data = [
    ([{"X": "A11"}, {"X": "A12"}, {"X": "A13"}],)
]
df = spark.createDataFrame(data, ['col'])
df = df.withColumn('values', F.array_join(F.flatten(F.transform('col', lambda x: F.map_values(x))), ','))
df.show(truncate=False)