I am trying to create a new column ("newaggCol") in a Spark Dataframe using groupBy and sum (with PySpark 1.5). My numeric columns have been cast to either Long or Double. The columns used to form the groupBy are String and Timestamp. My code is as follows
df= df.withColumn("newaggCol",(df.groupBy([df.strCol,df.tsCol]).sum(df.longCol)))
My traceback for the error is coming to that line. And stating:
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
I feel that I must be calling the functions incorrectly?