0
votes

I get this exception in the spark application submitted with spark-submit (2.4.0)

User class threw exception: org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat, org.apache.spark.sql.execution.datasources.parquet.DefaultSource), please specify the fully qualified class name.;

My application is:

val sparkSession = SparkSession.builder()
      .appName(APP_NAME)
      .config("spark.sql.warehouse.dir", warehouseLocation)
      .enableHiveSupport()
      .getOrCreate()
sparkSession.sql(query)

I'm unable to figure out where this duplicate source for parquet is coming from:

Here is my spark-submit:

spark-submit-2.4.0 --master yarn-cluster \ --files="/etc/hive/hive-site.xml" \ --driver-class-path="/etc/hadoop/:/usr/lib/spark-packages/spark2.4.0/jars/:/usr/lib/spark-packages/spark2.4.0/lib/spark-assembly.jar:/usr/lib/hive/lib/"

Any suggestion ?

1
Hello @BiN try not to import more parquet .jar files than those Spark already has. The folders you are including with spark-submit it looks that probably have more of same jar files but different versions.abiratsis

1 Answers

1
votes

There was a mixup with the version of spark-submit (2.4) that I was using and the default SPARK_HOME pointing to an older version, just in case anyone else has the same issue.