For each row in table 1, I am trying to count the rows that are in table 2 and meet the condition based on value from table 1.
The age from table 1 should be between the StartAge and EndAge of table 2, OR equal to either StartAge and EndAge.
Will this be possible using a udf and withColumn? I've tried couple of ways to do it such as using withColumn and withColumn with a UDF, but both ways fail.
def counter(a):
return table2.where((table2.StartAge <= a) & (table2.EndAge >=a)).count()
counter_udf = udf(lambda age: counter(age), IntegerType())
table1 = table1.withColumn('Count', counter_udf('Age ID'))
Does this make sense? Thanks.
Example input and output:
