
I'm getting the following error attempting to flatten a highly nested structure:

org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(error,StructType(StructField(array,ArrayType(StructType(StructField(double,DoubleType,true), StructField(int,IntegerType,true), StructField(string,StringType,true)),true),true), StructField(double,DoubleType,true), StructField(int,IntegerType,true), StructField(string,StringType,true), StructField(struct,StructType(StructField(message,StringType,true), StructField(kind,StringType,true), StructField(stack,StringType,true)),true)),true), StructField(Error,StructType(StructField(array,ArrayType(StringType,true),true), StructField(string,StringType,true)),true)

I can't seem to figure out what in particular is causing this. What is the ambiguity, other than a deeply nested Struct?

Possible duplicate of stackoverflow.com/questions/66462194/… Take a look at the schema in the linked question. You probably have two fields on the same level with the same name. Also, when you are facing an issue and writing to SO, please provide an example of schema and a dataframe.newbie

1 Answers


This happens when you are doing a join between 2 dataframes, and both dataframes have a field with same name. When you call for the duplicated field, Spark doesn't know which column are you requesting. Solution: rename the field in one the sides of the join, and it is done. Example

  • dfA is a dataframe with 2 columns => (id,name)
  • dfB is a dataframe with 3 columns => (id,name,description)

You are joining both dataframes by column "id" and you want to select the "name" column in the second one:

val dfJoined = dfA.join(dfB,Seq("id"),"inner").select("name")

As column "name" is existing in both dataframes, Spark cannot identify which "name" are you asking for.


val dfRenamedB = dfB.withColumnRenamed("name","b_name")

Now, when you are joining both dataframes, you would get columns "name" and "b_name", and you could identify which one is the selected one.