4
votes

I'm using Hive/Hue/MapReduce with a json Serde. To get this working I have copied the json_serde.jar to several lib directories on every cluster node:

  • /opt/cloudera/parcels/CDH/lib/hive/lib
  • /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib
  • /opt/cloudera/parcels/CDH/lib/hadoop/lib
  • /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib
  • ...

On every CDH update of the cluster I have to do that again. Is there a more elegant way where the distribution of the Serde in the cluster would be automatic and resistant to updates?

2

2 Answers

4
votes

If using HiveServer2 (Default in Cloudera 5.0+) the following configuration will work across your entire cluster without having to copy the jar to each node.

In your hive-site.xml config file, or if you're using Cloudera Manager in the "HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml" config box

<property>
 <name>hive.aux.jars.path</name>
 <value>/user/hive/aux_jars/hive-serdes-1.0-snapshot.jar</value>
</property>

Then create the directory in your HDFS filesystem (/user/hive/aux_jars) and place the jar file in it. If you are running HUE you can do this part via the web UI, just click on File Browser at the top right.

1
votes

It depends on the version of Hue and if using Beeswax or HiveServer2: