0
votes

I have a below nested schema:

root
|-- fields: struct (nullable = true)
|    |-- configdata: struct (containsNull = true)
|    |    |-- field: string (nullable = true)
|    |    |-- type: string (nullable = true)
|    |    |-- value: string (nullable = true)
|    |-- configdata:struct (containsNull = true)
|    |    |-- field1: string (nullable = true)
|    |    |-- type1: string (nullable = true)
|    |    |-- value1: string (nullable = true)
|-- id: string (nullable = true)
|-- score: double (nullable = true)
|-- siteId: string (nullable = true)

I have to read both the configdata: property from this json. But when I am trying to do :

newDf= dataframe.select(sf.array(sf.expr("configdata"))

It is failing with the Exception:

Ambiguous reference to fields StructField(configdata)

As you can see the configData is of struct Type and I have to read both the config data from this json. Code developed in Pyspark using spark dataFrame API. Can someone please help?

1

1 Answers

1
votes

There are 2 structs an no array situation evident. That is ambiguous as they are at the same level. Not possible. Give a new name to the second one: 'configdata1', at source.