I have a dataframe in spark with the following schema: schema:
StructType(List(StructField(id,StringType,true),
StructField(daily_id,StringType,true),
StructField(activity,StringType,true)))
Column activity is a String, sample content:
{1.33,0.567,1.897,0,0.78}
I need to cast column Activity to a ArrayType(DoubleType)
In order to get that done i have run the following command:
df = df.withColumn("activity",split(col("activity"),",\s*").cast(ArrayType(DoubleType())))
The new schema of the dataframe changed accordingly:
StructType(List(StructField(id,StringType,true),
StructField(daily_id,StringType,true),
StructField(activity,ArrayType(DoubleType,true),true)))
However, the data now looks like this: [NULL,0.567,1.897,0,NULL]
It changed the first and last element of the array of strings to NULL. I can't figure out why Spark is doing this with the dataframe.
Can please help here on what is the issue?
Many Thanks