I have 2 data sources. One contains a list of api calls and the other contains all related authentication events. There can be multiple Auth Events for each Api Call, I want to find the auth event that:
a) contains the same "identifier" as the Api Call
b) happened within a second after the Api Call
c) is the closest to the Api Call after the above filtering.
I had planned to loop through each ApiCall event in a foreach loop and then use filter statements on the authevents to find the correct one - however, it does not appear that this is possible (USING Filter in a Nested FOREACH in PIG)
Would anyone be able to suggest other ways to achieve this. If it helps, here's the Pig script I tried to use:
apiRequests = LOAD '/Documents/ApiRequests.txt' AS (api_fileName:chararray, api_requestTime:long, api_timeFromLog:chararray, api_call:chararray, api_leadString:chararray, api_xmlPayload:chararray, api_sourceIp:chararray, api_username:chararray, api_identifier:chararray);
authEvents = LOAD '/Documents/AuthEvents.txt' AS (auth_fileName:chararray, auth_requestTime:long, auth_timeFromLog:chararray, auth_call:chararray, auth_leadString:chararray, auth_xmlPayload:chararray, auth_sourceIp:chararray, auth_username:chararray, auth_identifier:chararray);
specificApiCall = FILTER apiRequests BY api_call == 'CSGetUser'; -- Get all events for this specific call
match = foreach specificApiCall { -- Now try to get the closest mathcing auth event
filtered1 = filter authEvents by auth_identifier == api_identifier; -- Only use auth events that have the same identifier (this will return several)
filtered2 = filter filtered1 by (auth_requestTime-api_requestTime)<1000; -- Further refine by usings auth events within a second on the api call's tiime
sorted = order filtered2 by auth_requestTime; -- Get the auth event that's closest to the api call
limited = limit sorted 1;
generate limited;
};
dump match;