0
votes

I am trying to create an external table to read data from partitioned parquet files in hdfs. To do so, I am creating first the external table using this string:

spark.sql(
  "CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable (col1 int)\n" +
  "PARTITIONED BY (yyyy int, mm int)\n" +
  "STORED AS PARQUET\n" +
  "LOCATION 'hdfs://group/poc/mydata'"
)

being spark a SparkSession created with these two options:

spark = SparkSession
      .builder()
      .enableHiveSupport()
      .config(sparkConfigurations)
      .getOrCreate()

def sparkConfigurations = {
    cfg.set("hive.exec.dynamic.partition", "true")
    cfg.set("hive.exec.dynamic.partition.mode", "nonrestrict")
  }

And then, I am trying to insert data in this table from a dataframe:

df.write
  .mode(SaveMode.Append)
  .insertInto("mydb.mytable")

being df a dataframe with same schema than hive table.

This last insertInto raises a NullPointerException error, without more information.

Worst of all, is if I run the first CREATE EXTERNAL TABLE code from hive, the insertInto method starts to work well.

PS: I cannot use the saveAsTable method because I am using spark 2.1.0 and this method is not supported until version 2.2.0.

Thanks for your help.

1
Please include the traceback. - zero323
It returns a NullPointerException, so there is not too much to do with the log. I have already digging on it, and the problem happens just when I explained - James
And how should know which component throws NPE? - zero323
the insertInto method returns a NPE - James
That doesn't really narrow things down much. - zero323

1 Answers

0
votes

I have found the problem...

When I create the hive table with spark.sql it adds some extra literature in form of TBLPROPERTIES. Inside these properties, there were the partition columns I was using, but in uppercase, where the column names are in lower case.

That was returning a NPE, so once I have changed everything to be in lower case it started to work.