0
votes

I have a JSON that looks like this:

 "mapping_field" : {
        "values" : {
            "key1" : {
                "id" : "key1", 
                "field1" : "value1", 
                "field2" : "value2", 
            }, 
            "key2" : {
                "id" : "key2", 
                "field1" : "value3", 
                "field2" : "value4", 
            }
        }, 
        "keys" : [
            "key1", 
            "key2"
        ]
}

I am trying to map this structure to a Spark Schema. I have already created the following; however it's not working. I have also tried removine the ArrayType in the Values field mapping.

StructType("mapping_field",
    MapType(
        StructField("keys", ArrayType(StringType())),
        StructField("values", ArrayType(StructType([
            StructField("id",StringType()),
            StructField("field1",StringType()),
            StructField("field2",StringType())
        ])))
    )
)

Also, please note they “key1” and “key2” are dynamic fields that will be generated with a unique identifier. There can also be more than two keys. Has anyone been able to map ArrayType to StructType?

1
I don't think that MapType is the right type for that. I would sent StructType inside StrucType for nested dictionaries and an ArrayType for the "keys" part of your JSON. - daguito81

1 Answers

2
votes

The struct type for provided JSON:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.{ArrayType, MapType, StructField, StructType, StringType}

val json = """ {
    "mapping_field" : {
            "values" : {
                "key1" : {
                    "id" : "key1",
                    "field1" : "value1",
                    "field2" : "value2"
                },
                "key2" : {
                    "id" : "key2",
                    "field1" : "value3",
                    "field2" : "value4"
                }
            },
            "keys" : [
                "key1",
                "key2"
            ]
    }
  }
  """


val struct = StructType(
  StructField("mapping_field", StructType(
    StructType(
      StructField("values", MapType(StringType, StructType(
        StructField("id", StringType, false) ::
        StructField("field1", StringType, false) ::
        StructField("field2", StringType, false) :: Nil)
      ), false) ::
      StructField("keys", ArrayType(StringType), false) :: Nil)
  ), false) :: Nil)

import spark.implicits._
val df = List(json)
    .toDF("json_col")
    .select(from_json($"json_col", struct))