2
votes

I have a hive table(say table1) in avro file format with 1900 columns. When I query the table in hive - I am able to fetch data but when I query the same table in spark sql I am getting metastore client lost connection. Attempting to reconnect

I have also queried another hive table(say table2) in avro file format with 130 columns it's fetching data both in hive and spark.

What I observed is I can see data in hdfs location of table2 but I can't see any data in table1 hdfs location (but it's feching data when I query only in hive)

2
Are you getting the error - "Metastore client lost connection" during all the time when you query this table1 from Spark? It's worth checking HMS's and the Back-end DB's availability when you see the error.Gomz

2 Answers

0
votes
  1. Split tell you about number of mappers in MR job.
  2. It doesn’t show you the exact location from where the data had been picked.
0
votes

The below will help you to check where the data for Table1 is stored in HDFS.

For Table 1: You can check the location of data in HDFS by running a SELECT query with WHERE conditions in Hive with MapReduce as the execution engine. Once the job is complete, you can check the map task's log of the YARN application (specifically for the text "Processing file") and find where the input data files have been taken from.

Also, try checking the location of data for both the tables present in HiveMetastore by running "SHOW CREATE TABLE ;" in hive for both the tables in Hive. From the result, try to check the "LOCATION" details.