1
votes

I am trying to convert my pandas codes to pyspark dataframe and trying to apply function on one column of the dataframe. I have done something as below in pandas dataframe. Adding new column to pandas dataframe after manipulating few column values as below.

from currency_converter import CurrencyConverter

def convert_USD_INR(row):
     USD_amount = c.convert(row['Sales'], 'INR', 'USD', date=date(row['Calendar year'], row['Calendar month'], 1))
return USD_amount

salesData['Sales (INR)'] = salesData.apply(convert_USD_INR, axis=1)

Can someone please point me any example on converting this into pyspark dataframe? Basically I want to apply a function on a pyspark dataframe column. Thanks.

1
search for udf's (user defined function) - rock321987

1 Answers

1
votes

Yes thanks I managed to complete as below. Sharing the solution if this will be of useful to someone.

from currency_converter import CurrencyConverter
from pyspark.sql.functions import *

def convert_USD_INR(sales, year, month):
     USD_amount = c.convert(sales, 'INR', 'USD', date=date(year, month, 1))
return USD_amount

convert_USD_INR_udf = udf(convert_USD_INR, DoubleType())

salesData = salesData.withColumn('Sales(INR)', gross_convert_AUD_USD_udf(salesData['sales'], salesData['year'], salesData['month']))