1
votes

I have a data source in which I need to return all pairs of events (event1, event2) from a single data source, where field1 from event1 matches field2 from event2.

For example, let's say I have the following data.

enter image description here

I need to return a pair of events where field id from event1, matches field referrer_id from event2. Let's say, to get the following report.

  • Adam Anderson referred Betty Burger on 2016-01-02 08:00:00.000
  • Adam Anderson referred Carol Camp on 2016-01-03 08:00:00.000
  • Betty Burger referred Darren Dougan on 2016-01-04 08:00:00.000

In sql I can do this quite easily with the following command.

select a.first_name as first1, a.last_name as last1, b.first_name as first2,
       b.last_name as last2, b.date as date
from myTable a
inner join myTable b on a.id = b.referrer_id;

Which returns the following table,

enter image description here

which gives exactly the data I need.

Now, I've been attempting to replicate this in a splunk query and have run into quite a few issues. First I attempted to use the transaction command, but that aggregated all of the related events together as opposed to matching them a pair at a time.

Next, I attempted to use a subsearch, first finding the id and then searching in the subsearch, first for the first event by id and the appending the second event by referral_id. Then, since append creates a new row instead of appending to the same row, using a stats to aggregate the resulting rows by the matching id field. I did attempt to use appendcols but that didn't return anything for me.

...
| table id
| map search="search id=$id$
  | fields first_name, last_name, id
  | rename first_name as first1
  | rename last_name as last1
  | rename id as match_id
  | append [search $id$
    | search referral_id=$id$
    | fields first_name, last_name, referral_id, date
    | rename first_name as first2
    | rename last_name as span2
    | rename referral_id as match_id]"
| fields first1, last1, first2, last2, match_id, time
| stats values(first1) as first1, values(last1) as last1, values(first2) as first2,
  values(last2) as last2, values(time) as time by id

The above query works for me and gives me the table I need, but it is incredibly slow due to the repeated searches over the entire time frame, and also limited by the map maxsearches which, for whatever reason, cannot be set to unlimited.

This seems like an overly complicated solution, especially in comparison to the sql query. Surely there must exist a simpler, faster way that this can be done, which isn't limited by the arbitrary limited settings or the multiple repeating search queries. I would greatly appreciate any help.

1
Have you considered join?PM 77-1
I've considered join. It didn't return all the results I needed - it would only return two rows instead of three, matching A on B only, instead of matching A on B and on C.user3037540

1 Answers

0
votes

I ended up using append. Using join gave me faster results, but didn't result in every matching pair, for my example it would return 2 rows instead of three, returning Adam with Betty, but not returning Adam with Carol.

Using append returned a full list, and using stats by id gave me the result I was looking for, a full list of each matching pair. It also gave extra empty fields, so I had to remove those, and then manipulate the resulting mv's into their own individual rows. Splunk doesn't offer a multifield mv expand, so I used a workaround.

...
| rename id as matchId, first_name as first1, last_name as last1
| table matchId, first1, last1
| append [
  search ... 
  | rename referrer_id as matchId, first_name as first2, last_name as last2
  | table matchId, first2, last2, date]
| stats list(first1) as first1, list(last1) as last1, list(first2) as first2, list(last2) as last2, list(date) as date by matchId
| search first1!=null last1!=null first2!=null last2!=null
| eval zipped=mvzip(mvzip(first2, last2, ","), date, ",")
| mvexpand zipped
| makemv zipped delim=","
| eval first2=mvindex(zipped, 0)
| eval last2=mvindex(zipped, 1)
| eval date=mvindex(zipped, 2)
| fields - zipped

This is faster than using a map with multiple subsearches, and gives all the of results. It is still limited by the maximum size of the subsearch, but at least provides the necessary data.