How to call a hive UDF written in Java using Pyspark from Hive Context

Question

I use getLastProcessedVal2 UDF in hive to get the latest partitions from table. This UDF is written in java . I would like to use the same UDF from pyspark using hive context.

dfsql_sel_nxt_batch_id_ini=sqlContext.sql(''' select l4_xxxx_seee.**getLastProcessedVal2**("/data/l4/work/hive/l4__stge/proctl_stg","APP_AMLMKTE_L1","L1_AMLMKT_MDWE","TRE_EXTION","2.1")''')

Error:

ERROR exec.FunctionRegistry: Unable to load UDF class: java.lang.ClassNotFoundException:

Ronak Patel Ronak Patel · Accepted Answer · 2016-07-21T01:32:09

start your pyspark shell as:

pyspark --jars /path/to.udf.jar <all-other-param>

OR

submit your pyspark job with --jars option as:

spark-submit --jars /path/to/udf.jar <all-other-param>

How to call a hive UDF written in Java using Pyspark from Hive Context

2 Answers