1
votes

I need to call a function from my spark sql queries. I have tried udf but I don't know how to manipulate it. Here is the scenario:

# my python function example

def sum(effdate, trandate):
  sum=effdate+trandate
  return sum

and my spark sql query is like:

spark.sql("select sum(cm.effdate, cm.trandate)as totalsum, name from CMLEdG cm ....").show()

These lines are not my code but I am stating it as an example. How could I call my sum function inside spark.sql(sql queries) for getting a result? Could you please kindly suggest me any link or any comment compatible with pyspark?

Any help would be appreciated.

Thanks

Kalyan

2

2 Answers

2
votes

Check this

    >>> from pyspark.sql.types import IntegerType
    >>> sqlContext.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    [Row(_c0=4)]
2
votes

You just need to register your function as UDF:

from spark.sql.types import IntegerType()

# my python function example
def sum(effdate, trandate):
  sum=effdate+trandate
  return sum

spark.udf("sum", sum, IntegerType())
spark.sql("select sum(cm.effdate, cm.trandate)as totalsum, name from CMLEdG cm....").show()