I'm trying to do a join between two PySpark dataframes, joining on a key, however the date of the first table should always come after the date of the second table. As an example. We have two tables that we're trying to join:
Table 1:
Date1 value1 key
13 Feb 2020 1 a
01 Mar 2020 2 a
31 Mar 2020 3 a
15 Apr 2020 4 a
Table 2:
Date2 value2 key
10 Feb 2020 11 a
15 Mar 2020 22 a
After the join, the result should be something like this:
Date1 value1 value2 key
13 Feb 2020 1 11 a
01 Mar 2020 2 null a
31 Mar 2020 3 22 a
15 Apr 2020 4 null a
Any ideas?