1
votes

Google BigQuery now support UDFs that works like mappers in mapreduce.

BigQuery supports user-defined functions (UDFs) written in JavaScript. A UDF is similar to the "Map" function in a MapReduce: it takes a single row as input and produces zero or more rows as output. The output can potentially have a different schema than the input.

From https://cloud.google.com/bigquery/user-defined-functions

What's the motivation behind implementing UDFs on rows over allowing UDFs which works as pure functions on columns/fields, like how UDFs work in hive https://cwiki.apache.org/confluence/display/Hive/HivePlugins.

I guess you can express any UDF that works on column (like hive UDF) as an UDFs that works on rows (BigQuery UDF) but not vise versa. That would be possible by defining a UDF (in BigQuery) with the same input and output schema as the dataset and all values just passed through but the field that you want to apply your function to.

This is of course cumbersome if you want to apply the same function to different datasets with different schemas. Please help me understand.

1

1 Answers

4
votes

Current implementation of UDFs in BigQuery is just the first step. As you note - it is most generic way if you want to be able to deal with nested and repeated structures, but it makes it cumbersome when you want just simple scalar values. Expect future improvements in this area where simple UDFs will be simple.