Best place for json Serde JAR in CDH Hadoop for use with Hive/Hue/MapReduce

Question

I'm using Hive/Hue/MapReduce with a json Serde. To get this working I have copied the json_serde.jar to several lib directories on every cluster node:

/opt/cloudera/parcels/CDH/lib/hive/lib
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib
/opt/cloudera/parcels/CDH/lib/hadoop/lib
/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib
...

On every CDH update of the cluster I have to do that again. Is there a more elegant way where the distribution of the Serde in the cluster would be automatic and resistant to updates?

T'Saavik T'Saavik · Accepted Answer · 2014-08-15T17:20:22

If using HiveServer2 (Default in Cloudera 5.0+) the following configuration will work across your entire cluster without having to copy the jar to each node.

In your hive-site.xml config file, or if you're using Cloudera Manager in the "HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml" config box

<property>
 <name>hive.aux.jars.path</name>
 <value>/user/hive/aux_jars/hive-serdes-1.0-snapshot.jar</value>
</property>

Then create the directory in your HDFS filesystem (/user/hive/aux_jars) and place the jar file in it. If you are running HUE you can do this part via the web UI, just click on File Browser at the top right.

Best place for json Serde JAR in CDH Hadoop for use with Hive/Hue/MapReduce

2 Answers