Issue while trying to query csv format hive table through spark-sql. Could any one explain the reason?

Question

I am getting "java.lang.ClassNotFoundException: com.bizo.hive.serde.csv.CSVSerde" exception when trying to query a hive table having properties ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

jar file is present at /usr/lib/hadoop/ which spark picks...but still giving me this error. I tried putting jar in /usr/lib/spark/lib as well but didn't work out. It works only if I add explicitly like spark-shell --jars /path/to/csv. Is there a way to configure environment permanently? please provide example if yes. — codecian

kennyut kennyut · Accepted Answer · 2016-04-08T21:04:49

The solution is adding a jar file when you submit your Spark command.

I had the same problem. I could not connect Spark to an Hive table with CSV format. But for other Hive tables Spark worked perfectly.

After reading through your post and Rao's comment, I realized it should be a missing jar issue.

Step 1: Download a jar file (csv-serde-1.1.2-0.11.0-all.jar) from here

Step 2: Then run spark-submit or spark-shell or pyspark with this jar. I use pyspark:

pyspark --deploy-mode client --master yarn --jars /your/jar/path/csv-serde-1.1.2-0.11.0-all.jar

Step 3: Test your Spark + Hive connection:

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
hiveTableRdd = sqlContext.sql("SELECT * FROM hiveDatabase.hiveTable")
hiveTableRdd.show()

Now it should work.

***note: I used 'com.bizo.hive.serde.csv.CSVSerde', because the data was double-quoated:

"ID1","A,John","25.6"
"ID2","B,Mike","29.1"
"ID3","C,Tony","27.3"
...

The Hive table with CSV CSVserde :

CREATE EXTERNAL TABLE hiveDatabase.hiveTable (
ID string,
Name string,
Value string
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
 'separatorChar' = '\,'
,'quoteChar' = '\"')
stored as textfile
LOCATION
  '/data/path/hiveTable';

Issue while trying to query csv format hive table through spark-sql. Could any one explain the reason?

1 Answers