3
votes

I've created a python UDF to convert datetimes into different timezones. The script uses pytz which doesn't ship with python (or jython). I've tried a couple things:

  1. Bootstrapping PIG to install it's own jython and including pytz in that jython installation. I can't get PIG to use the newly installed jython, it keeps reverting to Amazon's jython.
  2. Setting PYTHONPATH to a local directory where the new modules have been installed
  3. Setting HADOOP_CLASSPATH/PIG_CLASSPATH to the new installation of jython

Each of these ends up with "ImportError: No module named pytz" when I try to load the UDF script. The script loads fine if I remove pytz so it's definitely the external module that's giving it problems.

Edit: Originally put this as a comment but I thought I'd just make it an edit:

I've tried every way I know of to get PIG to recognize another jython jar. That hasn't worked. Amazon's jython is here: /home/hadoop/.versions/pig-0.9.2/lib/pig/jython.jar, with is recognizing this sys.path: /home/hadoop/lib/Lib. I can't figure out how to build external libraries against this jar.

1
stackoverflow.com/questions/6811549/… may help you (they are trying to load a different module, but the method should be the same) - Chris White
Yes, I've tried to bootstrap the package to each slave. It worked but the problem is that I can't get PIG to use the jython jar that I've installed. Instead it always picks Amazon's jython jar which doesn't have any external libraries installed. - Bob Briski
I guess the runtime resolved classpath has their jython jar ahead of yours - are you able to amend the hadoop-env.sh file? (i haven't worked with EMR, sorry) - Chris White
I haven't tried that yet but I have directly assigned the HADOOP_CLASSPATH and PIG_CLASSPATH on the line calling the pig executable like so: stackoverflow.com/questions/9300509 - Bob Briski

1 Answers

0
votes

could you manually hack sys.path inside of your jython script?