Using Spark 2.4 and Hive 3.1.0 in HDP 3.1 , I am trying to read managed table from hive using spark sql, but it returns an empty dataframe, while it could read an external table easily. How can i read the managed table from hive by spark sql?
Note: The hive maanged table is not empty when reading it usig the hive client.
1- I tried to format the table by ORC an parquet and it failed in both.
2- I failed to read it using HWC.
3- I failed to read it when using JDBC.
os.environ["HADOOP_USER_NAME"] = 'hdfs'
spark = SparkSession\
.builder\
.appName('NHIC')\
.config('spark.sql.warehouse.dir', 'hdfs://192.168.1.65:50070/user/hive/warehouse')\
.config("hive.metastore.uris", "thrift://192.168.1.66:9083")\
.enableHiveSupport()\
.getOrCreate()
HiveTableName ='nhic_poc.nhic_data_sample_formatted'
data = spark.sql('select * from '+HiveTableName)
The expected is to return the dataframe with Data but Actually the dataframe is empty.
spark.sql("describe formatted nhic_poc.nhic_data_sample_formatted").show(1000,False)
and see if it works? – mazaneichaspark.sql.warehouse.dir
from 50070 (http) to 8020 (rpc), or remove this config completely (hive.metastore.uris
should be sufficient to use external metastore). Next, I'd verify thespark.sql.hive.metastore.version
against your actual Hive metastore. – mazaneicha