Casting an array of Doubles to String in spark sql

Question

I'm trying read data from a JSON which has an array having lat, long values something like [48.597315,-43.206085] and I want to parse them in spark sql as a single string. is there a way I can do that?

my JSON input will look something like below.

{"id":"11700","position":{"type":"Point","coordinates":[48.597315,-43.206085]}

I'm trying to push this to a rdbms store and when I'm trying to cast position.coordinates to string it's giving me

Can't get JDBC type for array<string>

as the destination datatype is nvarchar. any kind help is appreciated.!

I assume that you will get an answer much faster if you actually provide an minimal reproducible example that shows clearly what the problem is. Currently, it's a bit too short... — Andrey Tyukin

Leo C Leo C · Accepted Answer · 2018-06-08T23:05:58

You can read your json file into a DataFrame, then 1) use concat_ws to stringify your lat/lon array into a single column, and 2) use struct to re-assemble the position struct-type column as follows:

// jsonfile:
// {"id":"11700","position":{"type":"Point","coordinates":[48.597315,-43.206085]}}

import org.apache.spark.sql.functions._
val df = spark.read.json("/path/to/jsonfile")

// printSchema:
// root
//  |-- id: string (nullable = true)
//  |-- position: struct (nullable = true)
//  |    |-- coordinates: array (nullable = true)
//  |    |    |-- element: double (containsNull = true)
//  |    |-- type: string (nullable = true)

df.withColumn("coordinates", concat_ws(",", $"position.coordinates")).
  select($"id", struct($"coordinates", $"position.type").as("position")).
  show(false)
// +-----+----------------------------+
// |id   |position                    |
// +-----+----------------------------+
// |11700|[48.597315,-43.206085,Point]|
// +-----+----------------------------+

// printSchema:
// root
//  |-- id: string (nullable = true)
//  |-- position: struct (nullable = false)
//  |    |-- coordinates: string (nullable = false)
//  |    |-- type: string (nullable = true)

[UPDATE]

Using Spark SQL:

df.createOrReplaceTempView("position_table")

spark.sql("""
    select id, concat_ws(',', position.coordinates) as position_coordinates
    from position_table
  """).
  show(false)
//+-----+--------------------+
//|id   |position_coordinates|
//+-----+--------------------+
//|11700|48.597315,-43.206085|
//|11800|49.611254,-43.90223 |
//+-----+--------------------+

Casting an array of Doubles to String in spark sql

2 Answers