I have a complex/nested Hive-External table which is created on top of HDFS (Files are in avro format). When I run the hive query it shows all records and partitions.
However when I use the same table in Spark:
val df = spark
.read
.format("avro")
.load("avro_files")
.option("avroSchema", Schema.toString)
It does not show the partition column.
But, when I use spark.sql("select * from hive_External_Table"), it is fine and I can see it
in the created dataframe but the problem is that I cannot manually pass the provided schema.
Please note, when I looked at the data, the partition column is not part of the underlying saved data, but I can see it when I query the table through Hive.I also can see the partition column when I try to load the avro files using pyspark:
df = ( sqlContext.read.format("com.databricks.spark.avro").option("avroSchema", pegIndivSchema).load('avro_files'))
So I was wondering what it is like that?