I am trying to create an external table to read data from partitioned parquet files in hdfs. To do so, I am creating first the external table using this string:
spark.sql(
"CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable (col1 int)\n" +
"PARTITIONED BY (yyyy int, mm int)\n" +
"STORED AS PARQUET\n" +
"LOCATION 'hdfs://group/poc/mydata'"
)
being spark a SparkSession created with these two options:
spark = SparkSession
.builder()
.enableHiveSupport()
.config(sparkConfigurations)
.getOrCreate()
def sparkConfigurations = {
cfg.set("hive.exec.dynamic.partition", "true")
cfg.set("hive.exec.dynamic.partition.mode", "nonrestrict")
}
And then, I am trying to insert data in this table from a dataframe:
df.write
.mode(SaveMode.Append)
.insertInto("mydb.mytable")
being df a dataframe with same schema than hive table.
This last insertInto raises a NullPointerException error, without more information.
Worst of all, is if I run the first CREATE EXTERNAL TABLE code from hive, the insertInto method starts to work well.
PS: I cannot use the saveAsTable method because I am using spark 2.1.0 and this method is not supported until version 2.2.0.
Thanks for your help.