0
votes

I have the following lookup:

C1 | C2| C3| C4
===|===|===|====
A  | 1 | x | test
===|===|===|====
A  | 2 | y | test
===|===|===|====
B  | 1 | x | test
===|===|===|====
B  | 1 | y | test
===|===|===|====
B  | 1 | z | test

I want this to be converted to:

C1 | C2| C3| C4
===|===|===|====
B  | 1 | x | test
===|===|===|====
B  | 1 | y | test
===|===|===|====
B  | 1 | z | test

So the idea is that if for unique value of C1, if there are multiple values in C2, the such combinations of C1 + C2 should be filtered out.

What I tried was:

| inputlookup LUT.csv
| fillnull value="NULL"
| stats  list(*) as * dc("C2") as count by  "C1"
| where count=1

but this results in:

C1 | C2    | C3    | C4
===|=======|=======|==============
B  | 1,1,1 | x,y,z | test,test,test

I DONOT want comma separated values. I want different row.

1

1 Answers

0
votes

Here's a run-anywhere example. It uses streamstats and stats to count the number of variations of C1/C2 pairs. Then filters out anything with a single C1/C2 pair. Finally, mvexpand separates the multi-value C3 field into different events.

| makeresults | eval _raw="C1  C2 C3 C4
A   1  x  test
A   2  y  test
B   1  x  test
B   1  y  test
B   1  z  test" | multikv
`comment("Everything above is just test data")`
| streamstats reset_on_change=true count by C1,C2 
| stats values(*) as * max(count) as max by C1
| where max > 1
| mvexpand C3
| table C1 C2 C3