0
votes

I'm having a weird problem with Spark SQL on an external table defined in Hive with

CREATE EXTERNAL TABLE ... STORED AS PARQUET... LOCATION 'hdfs://path/TABLENAME'

If I refer to the table in Spark with spark.table("tablename") or spark.sql("select column from tablename") I get the right row count but every value is null.

When I query the table through Beeline, I get the right values.

Additionally, if I query the Parquet directly in Spark with spark.read.parquet("hdfs://path/TABLENAME") I also get the right answer.

To make it even stranger - if I create another external table with a similar CREATE EXTERNAL TABLE... statement against the same Parquet in HDFS, Spark SQL works.

Where do I look next?

1
try adding the schema in the select statement. - Amar Singh
clarified that it doesn't matter if I use select * or select column - wrschneider

1 Answers

0
votes

I faced a similar issue running hive SQL from ab initio now. My SQL did not have aliases for each field. When I added the alias it worked.