I have a table where I want to filter down to only those rows where the group that row belongs to is found to have only a single unique value.
For example, if I am grouping the following data by the index
column, I want to count the unique values of foo
in each group and then drop any rows where there is more than just one value of foo
Example Input:
index foo
0 A 10
1 A 20
2 A 30
3 B 10
4 B 20
5 B 30
6 C 10
7 C 20
8 C 30
9 D 10
10 D 10
11 D 10
12 E 10
13 E 20
14 E 30
15 F 10
16 F 10
17 F 10
Expected Output:
index foo
0 D 10
1 D 10
2 D 10
3 F 10
4 F 10
5 F 10
I am able to do this in Python but am having trouble implementing the same solution in DAX (Power BI).
This is my Python approach:
import pandas as pd
data = pd.DataFrame({'index': {0: "A", 1: "A", 2: "A", 3: "B", 4: "B", 5: "B", 6: "C",
7: "C", 8: "C", 9: "D", 10: "D", 11: "D", 12: "E",
13: "E", 14: "E", 15: "F", 16: "F", 17: "F"},
'foo': {0: 10, 1: 20, 2: 30, 3: 10, 4: 20, 5: 30, 6: 10,
7: 20, 8: 30, 9: 10, 10: 10, 11: 10, 12: 10,
13: 20, 14: 30, 15: 10, 16: 10, 17: 10}})
# count distinct values of "foo"
foo_distinct = data.groupby('index').agg(lambda x: len(set(x)))
# add the distinct value counts of foo to main table
joined_data = data.set_index('index').join(foo_distinct, rsuffix='_distinctgroupcount')
# filter out any values where there is more than a single distinct value
filtered_data = joined_data.loc[lambda df: df['foo_distinctgroupcount'] == 1])
And my attempt in DAX. The calculation of foo_distinct
works as expected but I am having trouble getting the distinct group counts merged into the original data set (the attempt to get joined_data
does not work).
foo_distinct = SUMMARIZE(data, data[index], "foo_distinctgroupcount", DISTINCTCOUNT(data[foo]))
joined_data = CALCULATETABLE(NATURALLEFTOUTERJOIN(data, foo_distinct), USERELATIONSHIP(data[index], foo_distinct[index]))
My questions:
- Is this problem best solved by groupby, distinctcount, and joining those values back in to the original data or is there a more efficient/elegant approach?
- I believe I see that
NATURALLEFTOUTERJOIN
can't be done sincedata
andfoo_distinct
do not share a relationship (but they should since they are derived from the same source; however this function does not accept aUSERELATIONSHIP
call as far as I can tell. Is there a function that is more flexible for handling this? For example SQL would have theON
keyword available.
groupby
+transform
. Transform broadcasts the aggregated result back to every row that group belonged to in the original DataFrame, making it an easy and efficient way to filter entire groups from your DataFrame based on an aggregated result. – ALollz