I have Spark jobs on EMR, and EMR is configured to use the Glue catalog for Hive and Spark metadata.
I create Hive external tables, and they appear in the Glue catalog, and my Spark jobs can reference them in Spark SQL like spark.sql("select * from hive_table ...")
Now, when I try to run the same code in a Glue job, it fails with "table not found" error. It looks like Glue jobs are not using the Glue catalog for Spark SQL the same way that Spark SQL would running in EMR.
I can work around this by using Glue APIs and registering dataframes as temp views:
create_dynamic_frame_from_catalog(...).toDF().createOrReplaceTempView(...)
but is there a way to do this automatically?
glueContext = GlueContext(SparkContext.getOrCreate())
and thenspark = glueContext.spark_session
– wrschneider