I have a data source in which I need to return all pairs of events (event1, event2) from a single data source, where field1 from event1 matches field2 from event2.
For example, let's say I have the following data.
I need to return a pair of events where field id from event1, matches field referrer_id from event2. Let's say, to get the following report.
- Adam Anderson referred Betty Burger on 2016-01-02 08:00:00.000
- Adam Anderson referred Carol Camp on 2016-01-03 08:00:00.000
- Betty Burger referred Darren Dougan on 2016-01-04 08:00:00.000
In sql I can do this quite easily with the following command.
select a.first_name as first1, a.last_name as last1, b.first_name as first2,
b.last_name as last2, b.date as date
from myTable a
inner join myTable b on a.id = b.referrer_id;
Which returns the following table,
which gives exactly the data I need.
Now, I've been attempting to replicate this in a splunk query and have run into quite a few issues. First I attempted to use the transaction command, but that aggregated all of the related events together as opposed to matching them a pair at a time.
Next, I attempted to use a subsearch, first finding the id and then searching in the subsearch, first for the first event by id and the appending the second event by referral_id. Then, since append creates a new row instead of appending to the same row, using a stats to aggregate the resulting rows by the matching id field. I did attempt to use appendcols but that didn't return anything for me.
...
| table id
| map search="search id=$id$
| fields first_name, last_name, id
| rename first_name as first1
| rename last_name as last1
| rename id as match_id
| append [search $id$
| search referral_id=$id$
| fields first_name, last_name, referral_id, date
| rename first_name as first2
| rename last_name as span2
| rename referral_id as match_id]"
| fields first1, last1, first2, last2, match_id, time
| stats values(first1) as first1, values(last1) as last1, values(first2) as first2,
values(last2) as last2, values(time) as time by id
The above query works for me and gives me the table I need, but it is incredibly slow due to the repeated searches over the entire time frame, and also limited by the map maxsearches which, for whatever reason, cannot be set to unlimited.
This seems like an overly complicated solution, especially in comparison to the sql query. Surely there must exist a simpler, faster way that this can be done, which isn't limited by the arbitrary limited settings or the multiple repeating search queries. I would greatly appreciate any help.
join
? – PM 77-1