1
votes

Is it possible to cast a StringType column to an ArrayType column in a spark dataframe ?

df.printSchema() gives this

Schema ->
a: string(nullable= true)

Now I want to convert this to

a: array(nullable= true)

2
You can't cast you have to split iteliasah
how can i split it . could you explain it with an example?khrystal
would you care at least give a data sample ?eliasah
a = [{val1:"somevalue_x", val2:"somevalue_y"}, {val1:"someValue_z", val2:"someValue_v"}] , currently a is a string and i want it as an arraykhrystal

2 Answers

5
votes

As elisiah commented you have to split your string. You can use UDF:

    df.printSchema

    import org.apache.spark.sql.functions._

    val toArray = udf[Array[String], String]( _.split(" "))
    val featureDf = df
      .withColumn("a", toArray(df("a")))  

    featureDF.printSchema

Gives output:

root  
 |-- a: string (nullable = true)

root
 |-- a: array (nullable = true)
 |    |-- element: string (containsNull = true)
0
votes

Another option to simply wrap any column in functions.array.

df.withColumn("a", functions.array(col("a")))