Spark dataframe insertinto hive table fails since some of the staging part files created with username mapr

Question

I am using Spark dataframe to insert into a hive table. Even though the application is being submitted using the username 'myuser', some of the hive staging part files gets created with username 'mapr'. So the final write into the hive table fails while renaming the staging files saying access denied. Command:

resultDf.write.mode("append").insertInto(insTable)

Error:

Exception in thread "main" org.apache.hadoop.security.AccessControlException: User myuser(user id 2547) does has been denied access to rename /ded /data/db/da_mydb.db/managed/da_primary/.hive-staging_hive_2017-12-27_13-25-22_586_3120774356819313410-1/-ext-10000/_temporary/0/task_201712271325_0080_m_000000/part-00000 to /ded /data/db/da_mydb.db/managed/da_primary/.hive-staging_hive_2017-12-27_13-25-22_586_3120774356819313410-1/-ext-10000/part-00000 at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:1112) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:461) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:475) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:392) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:364) at org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136) at org.apache.spark.sql.hive.SparkHiveWriterContainer.commitJob(hiveWriterContainers.scala:108) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.saveAsHiveFile(InsertIntoHiveTable.scala:85) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:201) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:166) at com.iri.suppChain.RunKeying$.execXForm(RunKeying.scala:74) at com.iri.suppChain.RunKeying$$anonfun$1.apply(RunKeying.scala:36) at com.iri.suppChain.RunKeying$$anonfun$1.apply(RunKeying.scala:36) at scala.collection.immutable.List.foreach(List.scala:318) at com.iri.suppChain.RunKeying$delayedInit$body.apply(RunKeying.scala:36) at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

Below are the environment details:

Spark 1.6.1
Distribution Mapr

Bala Bala · Accepted Answer · 2017-12-29T17:21:36

Try the below and give the feedback

resultDF.registerTempTable("results_tbl")
sqlContext.sql("INSERT INTO TABLE insTable SELECT * FROM results_tbl")

Spark dataframe insertinto hive table fails since some of the staging part files created with username mapr

1 Answers