I have a hive table with one column being an array of strings. I also have a set of custom UDFs that manipulate individual strings. I would like to make hive execute my custom UDF on each element in an array and then return the result as a modified array.
This seems like a simple requirement, but I wasn't able to find a simple solution for it. I found two possibilities, none of them being simple really:
- Do a hive SQL gymnastic with explode and lateral view, then invoke UDF, then aggregate back into array. This seems way too big overkill as I don't see it executing in less than 2 mapreduce jobs (but I could be wrong here).
- Implement each of my UDFs as GenericUDF that, is supplied with an array, processes each element in it and returns an array again. This requires a lot more development.
Is there any simple way to do this?