I am trying to join two datatables using rolling join. I have looked at various answers including here but unfortunately unable to locate one that helps in this case. I am borrowing the same example from the link posted.
my first dataset is a websession data for two users 1 and 2:
user web_date_time
1 29-Oct-2016 6:10:03 PM
1 29-Oct-2016 7:34:17 PM
1 30-Oct-2016 2:08:03 PM
1 30-Oct-2016 3:55:12 PM
2 31-Oct-2016 11:32:12 AM
2 31-Oct-2016 2:59:56 PM
2 01-Nov-2016 12:49:44 PM
My second time stamp is for purchase:
user purchase_date_time
1 29-Oct-2016 6:10:00 PM
1 29-Oct-2016 6:11:00 PM
2 31-Oct-2016 11:35:12 AM
2 31-Oct-2016 2:50:00 PM
My desired output is which web session led to a purchase but with a constraint. The constraint is - the websession should be after the previous purchase. The desired out is as follows (requires for all purchases, an additional column "websession_led_purchase" to be created ):
user purchase_date_time websession_led_purchase
1 29-Oct-2016 6:10:00 PM NA
1 29-Oct-2016 6:11:00 PM 29-Oct-2016 6:10:03 PM
2 31-Oct-2016 11:35:12 AM 31-Oct-2016 11:32:12 AM
2 31-Oct-2016 2:50:00 PM NA
The first NA is on account of no websession before that purchase, the second NA is on account of no websession after the previous purchase (and before the purchase) that led to the second purchase for user 2.
I tried using the roll join method of dt2[dt1,roll=Inf], however, I get "31-Oct-2016 11:32:12 AM" for the fourth row in the desired output, which is incorrect.
Let me know your advice.
29-Oct-2016 6:10:00 PM
as first row in desired row? Why it cannot be29-Oct-2016 6:10:59 PM
? – MKRThe constraint is - the websession should be after the previous purchase
is not true for 2nd row in desired output. – MKR