1
votes

I have a JSON file like below :

{"Codes":[{"CName":"012","CValue":"XYZ1234","CLevel":"0","msg":"","CType":"event"},{"CName":"013","CValue":"ABC1234","CLevel":"1","msg":"","CType":"event"}}

I wanted to create the schema for this and if the JSON file is empty({}) it should be an empty String.

However, df Output is below when I used df.show:

[[012, XYZ1234, 0, event, ], [013, ABC1234, 1, event, ]]

I created Schema like below :

val schemaF = ArrayType(
  StructType(
    Array(
      StructField("CName", StringType),
      StructField("CValue", StringType),
      StructField("CLevel", StringType),
      StructField("msg", StringType),
      StructField("CType", StringType)
    )
  )
)

When I tried below,

val df1 = df.withColumn("Codes",from_json('Codes, schemaF))

It gives AnalysisException :

org.apache.spark.sql.AnalysisException: cannot resolve 'jsontostructs(Codes)' due to data type mismatch: argument 1 requires string type, however, 'Codes' is of array<structCName:string,CValue:string,CLevel:string,CType:string,msg:string> type.;; 'Project [valid#51, jsontostructs(ArrayType(StructType(StructField(CName,StringType,true), StructField(CValue,StringType,true), StructField(CLevel,StringType,true), StructField(msg,StringType,true), StructField(CType,StringType,true)),true), Codes#8, Some(America/Bogota)) AS errorCodes#77]

Can someone please tell me why and how to resolve this issue?

2
Codes column is already of type array of struct, why do you want to use from_json?blackbishop
I see that your Json file is not well defined, where is the closing ] for your your arrayitIsNaz
I forgot to copy ] bracket. @itIsNazYOGESH S
Because if the codes is empty(i.e { Codes : [] }), I want to make use of Schema @blackbishopYOGESH S

2 Answers

0
votes

val schema =
      StructType(
        Array(
          StructField("CName", StringType),
          StructField("CValue", StringType),
          StructField("CLevel", StringType),
          StructField("msg", StringType),
          StructField("CType", StringType)
        )

      )
    val df0 = spark.read.schema(schema).json("/path/to/data.json")
0
votes

Your schema does not correspond to the JSON file you're trying to read. It's missing the field Codes of array type, it should look like this :

val schema = StructType(
  Array(
    StructField(
      "Codes",
      ArrayType(
        StructType(
          Array(
            StructField("CLevel", StringType, true),
            StructField("CName", StringType, true),
            StructField("CType", StringType, true),
            StructField("CValue", StringType, true),
            StructField("msg", StringType, true)
          )
        ), true)
      ,true)
  )
)

And you want to apply it when reading the json not with from_json function :

val df = spark.read.schema(schema).json("path/to/json/file")

df.printSchema
//root
// |-- Codes: array (nullable = true)
// |    |-- element: struct (containsNull = true)
// |    |    |-- CLevel: string (nullable = true)
// |    |    |-- CName: string (nullable = true)
// |    |    |-- CType: string (nullable = true)
// |    |    |-- CValue: string (nullable = true)
// |    |    |-- msg: string (nullable = true)

EDIT:

For your comment question, you can use this schema definition:

val schema = StructType(
    Array(
      StructField(
        "Codes",
        ArrayType(
          StructType(
            Array(
              StructField("CLevel", StringType, true),
              StructField("CName", StringType, true),
              StructField("CType", StringType, true),
              StructField("CValue", StringType, true),
              StructField("msg", StringType, true)
            )
          ), true)
        ,true),
      StructField("lid", StructType(Array(StructField("idNo", StringType, true))), true)
    )
  )