0
votes

I'm saving a dataframe to parquet files. The schema generated looks like this:

org.apache.spark.sql.parquet.row.metadata{  
"type":"struct",
"fields":[  
  {  
     "name":"DCN",
     "type":"string",
     "nullable":true,
     "metadata":{}
  },
  {  
     "name":"EDW_id",
     "type":"string",
     "nullable":true,
     "metadata":{}
  },
  {  
     "name":"recievedTimestamp",
     "type":"string",
     "nullable":true,
     "metadata":{}
  },
  {  
     "name":"recievedDate",
     "type":"date",
     "nullable":true,
     "metadata":{}
  },
  {  
     "name":"rule",
     "type":"string",
     "nullable":true,
     "metadata":{}
  }
]}

The dataframe is being generated in a spark program; when I run it via spark-submit and display the dataframe I can see there are several hundred records. I'm saving the df to parquet like so:

df.write.format("parquet").mode(SaveMode.Overwrite).save('/home/my/location')

And creating an external table in hive like so:

CREATE EXTERNAL TABLE schemaname.tablename (
  DCN STRING,
  EDW_ID STRING,
  RECIEVEDTIMESTAMP STRING,
  RECIEVEDDATE STRING,
  RULE STRING) 
STORED AS PARQUET
LOCATION '/home/my/location';

The table is being created successfully, but it is not being populated with any data - when I query it, 0 records are returned. Can anyone spot what I'm doing wrong? This is using Hive 1.1 and Spark 1.6.

2

2 Answers

0
votes

Hive required jar file for handling the parquet file.

1.First download parquet-hive-bundle-1.5.0.jar

2.include the jar path into hive-site.xml.

<property>
   <name>hive.jar.directory</name>
   <value>/home/hduser/hive/lib/parquet-hive-bundle-1.5.0.jar</value>
</property>
0
votes

hive metadata store is case insensitive and stores all column name in lower case where as parquet stores as is . Try recreating the hive table in the same case .