I'm saving a dataframe to parquet files. The schema generated looks like this:
org.apache.spark.sql.parquet.row.metadata{
"type":"struct",
"fields":[
{
"name":"DCN",
"type":"string",
"nullable":true,
"metadata":{}
},
{
"name":"EDW_id",
"type":"string",
"nullable":true,
"metadata":{}
},
{
"name":"recievedTimestamp",
"type":"string",
"nullable":true,
"metadata":{}
},
{
"name":"recievedDate",
"type":"date",
"nullable":true,
"metadata":{}
},
{
"name":"rule",
"type":"string",
"nullable":true,
"metadata":{}
}
]}
The dataframe is being generated in a spark program; when I run it via spark-submit and display the dataframe I can see there are several hundred records. I'm saving the df to parquet like so:
df.write.format("parquet").mode(SaveMode.Overwrite).save('/home/my/location')
And creating an external table in hive like so:
CREATE EXTERNAL TABLE schemaname.tablename (
DCN STRING,
EDW_ID STRING,
RECIEVEDTIMESTAMP STRING,
RECIEVEDDATE STRING,
RULE STRING)
STORED AS PARQUET
LOCATION '/home/my/location';
The table is being created successfully, but it is not being populated with any data - when I query it, 0 records are returned. Can anyone spot what I'm doing wrong? This is using Hive 1.1 and Spark 1.6.