There's a DataFrame in PySpark with data as below:
Original data:
Shop Customer date retrive_days
A C1 15/06/2019 2
A C1 16/06/2019 0
A C1 17/06/2019 0
A C1 18/06/2019 0
B C2 20/07/2019 5
B C2 21/07/2019 0
B C2 23/07/2019 0
B C2 30/07/2019 0
B C2 01/08/2019 6
B C2 02/08/2019 0
B C2 03/08/2019 0
B C2 09/08/2019 0
B C2 10/08/2019 1
B C2 11/08/2019 0
B C2 13/08/2019 0
Each customer has a date he/she visited the shop and each customer also has retrive_days and that many days data has to be fetched to the output.
I am trying to get an output which should look like this in PySpark, filtered based on the retrive_days value for each customer
Expected Output:
Shop Customer date retrive_days
A C1 15/06/2019 2
A C1 16/06/2019 0
B C2 20/07/2019 5
B C2 21/07/2019 0
B C2 23/07/2019 0
B C2 01/08/2019 6
B C2 02/08/2019 0
B C2 03/08/2019 0
B C2 10/08/2019 1
B C2 11/08/2019 0