1
votes

I have a "long" dataset that looks like this

ID     Sport
123456 Basketball
123456 Baseball   
123456 Football
123456 Basketball
234567 Football
234567 Tennis
234567 Tennis
234567 Tennis
345678 Basketball
345678 Tennis
345678 Wrestling
345678 Volleyball

I created a measure using DISTINCTCOUNT() that is able to count unique associations with ID so if I were to show the measure in a table it would look like this

ID      Count
123456  3
234567  2
345678  4

Note for 123456 the unique associations are Basketball, Baseball, and Football hence three. For 234567 it is two, Football and Tennis. For 345678 it is all unique hence 4.

Based on those values, created another measure that gets the average which is 3. 3+2+4=9/3=3.

I want to create a measure that gives the count value of the ID and its σ(SD) from the mean. For example if i had a value of 3.81649658(3+1sd(.81649658)) it would be 1. If i had 2.18350342 it would be -1 (3-1sd).

345678 should have slightly above 1 and 234567 should be slightly below 1.

I have a distinct count measure, average measure, and SD measure.

1

1 Answers

1
votes

Z-Score is simply

(raw score - average score) / (standard deviation)

Suppose we have the following measures:

DCount  = DISTINCTCOUNT ( Sports[Sport] )

Average = AVERAGEX ( ALLSELECTED ( Sports[ID] ), [DCount] )

StdDev  = STDEVX.P ( ALLSELECTED ( Sports[ID] ), [DCount] )

I'm using ALLSELECTED so that it looks at all of the selected ID values, not just the single one in the current row filter context from the visual. I use the iterators AVERAGEX and STDDEVX.P since we're not working with an existing table column but rather a sequence of measure values while grouping on the ID column.

Then the Z-score can be defined as

Z-Score = DIVIDE ( [DCount] - [Average], [StdDev] )

Matrix visual screenshot