I am getting "java.lang.ClassNotFoundException: com.bizo.hive.serde.csv.CSVSerde" exception when trying to query a hive table having properties ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
1 Answers
0
votes
The solution is adding a jar file when you submit your Spark command.
I had the same problem. I could not connect Spark to an Hive table with CSV format. But for other Hive tables Spark worked perfectly.
After reading through your post and Rao's comment, I realized it should be a missing jar issue.
Step 1: Download a jar file (csv-serde-1.1.2-0.11.0-all.jar) from here
Step 2: Then run spark-submit or spark-shell or pyspark with this jar. I use pyspark:
pyspark --deploy-mode client --master yarn --jars /your/jar/path/csv-serde-1.1.2-0.11.0-all.jar
Step 3: Test your Spark + Hive connection:
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
hiveTableRdd = sqlContext.sql("SELECT * FROM hiveDatabase.hiveTable")
hiveTableRdd.show()
Now it should work.
***note: I used 'com.bizo.hive.serde.csv.CSVSerde', because the data was double-quoated:
"ID1","A,John","25.6"
"ID2","B,Mike","29.1"
"ID3","C,Tony","27.3"
...
The Hive table with CSV CSVserde :
CREATE EXTERNAL TABLE hiveDatabase.hiveTable (
ID string,
Name string,
Value string
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
'separatorChar' = '\,'
,'quoteChar' = '\"')
stored as textfile
LOCATION
'/data/path/hiveTable';
classpathfor the missing library. - Rao